Find Nearest Indices for One Array Against All Values in Another Array - Python/Numpy

Find nearest indices for one array against all values in another array - Python / NumPy

Here's one vectorized approach with np.searchsorted based on this post -

def closest_argmin(A, B):
    L = B.size
    sidx_B = B.argsort()
    sorted_B = B[sidx_B]
    sorted_idx = np.searchsorted(sorted_B, A)
    sorted_idx[sorted_idx==L] = L-1
    mask = (sorted_idx > 0) & \
    ((np.abs(A - sorted_B[sorted_idx-1]) < np.abs(A - sorted_B[sorted_idx])) )
    return sidx_B[sorted_idx-mask]

Brief explanation :

Get the sorted indices for the left positions. We do this with - np.searchsorted(arr1, arr2, side='left') or just np.searchsorted(arr1, arr2). Now, searchsorted expects sorted array as the first input, so we need some preparatory work there.
Compare the values at those left positions with the values at their immediate right positions (left + 1) and see which one is closest. We do this at the step that computes mask.
Based on whether the left ones or their immediate right ones are closest, choose the respective ones. This is done with the subtraction of indices with the mask values acting as the offsets being converted to ints.

Benchmarking

Original approach -

def org_app(myArray, refArray):
    out1 = np.empty(myArray.size, dtype=int)
    for i, value in enumerate(myArray):
        # find_nearest from posted question
        index = find_nearest(refArray, value)
        out1[i] = index
    return out1

Timings and verification -

In [188]: refArray = np.random.random(16)
     ...: myArray = np.random.random(1000)
     ...: 

In [189]: %timeit org_app(myArray, refArray)
100 loops, best of 3: 1.95 ms per loop

In [190]: %timeit closest_argmin(myArray, refArray)
10000 loops, best of 3: 36.6 µs per loop

In [191]: np.allclose(closest_argmin(myArray, refArray), org_app(myArray, refArray))
Out[191]: True

50x+ speedup for the posted sample and hopefully more for larger datasets!

Find nearest value in numpy array

import numpy as np
def find_nearest(array, value):
    array = np.asarray(array)
    idx = (np.abs(array - value)).argmin()
    return array[idx]

Example usage:

array = np.random.random(10)
print(array)
# [ 0.21069679  0.61290182  0.63425412  0.84635244  0.91599191  0.00213826
#   0.17104965  0.56874386  0.57319379  0.28719469]

print(find_nearest(array, value=0.5))
# 0.568743859261

Find indices of numpy array based on values in another numpy array

Taking into account the proposed options on the comments, and adding an extra option with numpy's in1d option:

>>> import numpy as np
>>> summed_rows = np.random.randint(low=1, high=14, size=9999)
>>> common_sums = np.array([7,10,13])
>>> ind_1 = (summed_rows==common_sums[:,None]).any(0).nonzero()[0]   # Option of @Brenlla
>>> ind_2 = np.where(summed_rows == common_sums[:, None])[1]   # Option of @Ravi Sharma
>>> ind_3 = np.arange(summed_rows.shape[0])[np.in1d(summed_rows, common_sums)]
>>> ind_4 = np.where(np.in1d(summed_rows, common_sums))[0]
>>> ind_5 = np.where(np.isin(summed_rows, common_sums))[0]   # Option of @jdehesa

>>> np.array_equal(np.sort(ind_1), np.sort(ind_2))
True
>>> np.array_equal(np.sort(ind_1), np.sort(ind_3))
True
>>> np.array_equal(np.sort(ind_1), np.sort(ind_4))
True
>>> np.array_equal(np.sort(ind_1), np.sort(ind_5))
True

If you time it, you can see that all of them are quite similar, but @Brenlla's option is the fastest one

python -m timeit -s 'import numpy as np; np.random.seed(0); a = np.random.randint(low=1, high=14, size=9999); b = np.array([7,10,13])' 'ind_1 = (a==b[:,None]).any(0).nonzero()[0]'
10000 loops, best of 3: 52.7 usec per loop

python -m timeit -s 'import numpy as np; np.random.seed(0); a = np.random.randint(low=1, high=14, size=9999); b = np.array([7,10,13])' 'ind_2 = np.where(a == b[:, None])[1]'
10000 loops, best of 3: 191 usec per loop

python -m timeit -s 'import numpy as np; np.random.seed(0); a = np.random.randint(low=1, high=14, size=9999); b = np.array([7,10,13])' 'ind_3 = np.arange(a.shape[0])[np.in1d(a, b)]'
10000 loops, best of 3: 103 usec per loop

python -m timeit -s 'import numpy as np; np.random.seed(0); a = np.random.randint(low=1, high=14, size=9999); b = np.array([7,10,13])' 'ind_4 = np.where(np.in1d(a, b))[0]'
10000 loops, best of 3: 63 usec per loo

python -m timeit -s 'import numpy as np; np.random.seed(0); a = np.random.randint(low=1, high=14, size=9999); b = np.array([7,10,13])' 'ind_5 = np.where(np.isin(a, b))[0]'
10000 loops, best of 3: 67.1 usec per loop

Vectorize finding closest value in an array for each element in another array

For example, you can compute all the differences in on go with:

differences = (test_array.reshape(1,-1) - known_array.reshape(-1,1))

And use argmin and fancy indexing along with np.diagonal to get desired indices and differences:

indices = np.abs(differences).argmin(axis=0)
residual = np.diagonal(differences[indices,])

So for

>>> known_array = np.array([-24, -18, -13, -30,  29])
>>> test_array = np.array([-6,  4, -6,  4,  8, -4,  8, -6,  2,  8])

One get

>>> indices
array([2, 2, 2, 2, 2, 2, 2, 2, 2, 2])
>>> residual
array([ 7, 17,  7, 17, 21,  9, 21,  7, 15, 21])

Finding nearest element of an array in a particular column of another array

Looking up lots of items in an array is easiest done when it is sorted. You can delegate most of the work to np.searchsorted. Since we want to find elements in arr_1, it is the only array that needs to be sorted. I suspect that having a sorted arr_2 will speed things up by reducing the size of the search space for every successive element.

First, find the insertion points where arr_2 would end up in arr_1:

indices = np.searchsorted(arr_1[:, 1], arr_2[:, 1])

Now all you have to do is check for cases where the prior element is closer than the current one. There are two corner cases: when index is 0, you have to accept it, and when it is arr_1.size, you have to take the prior.

indices[indices == arr_1.shape[0]] = arr_1.shape[0] - 1
indices[(indices != 0) & (arr_1[indices, 1] - arr_2[:, 1] > arr_2[:, 1] - arr_1[indices - 1, 1])] -= 1

Doing it in this order saves you the trouble of messing with temporary arrays. The first line ensures that the index arr_1[indices, 1] is always valid. Since index -1 is valid, the second line succeeds as well.

The final result is then

np.concatenate((arr_2, arr_1[indices, 0:1]), axis=1)

If arr_1 is not already sorted, you can do the following:

arr_1 = arr1[np.argsort(arr_1[:, 1]), :]

A quick benchmark shows that on my very moderately powered machine, this approach takes ~300ms for arr_1.shape = (500000, 2) and arr_2.shape = (300000, 2).

Numpy find identical elements from list of arrays and another array

Assuming a is a list of arrays, you can use broadcasting to perform the comparisons of all elements:

out = [x[(x == b[:,None]).all(2).any(0)] for x in a]

Output:

[array([[3, 4, 5]]),
 array([[5, 5, 5],
        [9, 3, 3]])]

Indices:

[np.where((x == b[:,None]).all(2).any(0))[0]  for x in a]

Output:

[array([2]), array([2, 3])]

Numpy: For every element in one array, find the index in another array

As Joe Kington said, searchsorted() can search element very quickly. To deal with elements that are not in x, you can check the searched result with original y, and create a masked array:

import numpy as np
x = np.array([3,5,7,1,9,8,6,6])
y = np.array([2,1,5,10,100,6])

index = np.argsort(x)
sorted_x = x[index]
sorted_index = np.searchsorted(sorted_x, y)

yindex = np.take(index, sorted_index, mode="clip")
mask = x[yindex] != y

result = np.ma.array(yindex, mask=mask)
print result

the result is:

[-- 3 1 -- -- 6]

Replace elements in numpy array with closest value in another array

You have to calculate the difference between each element in aa and bb, and take the minimum:

aa_nearest = bb[abs(aa[None, :] - bb[:, None]).argmin(axis=0)]

Find all indexes of a numpy array closest to a value

The following function will return a fractional index showing approximately when the value is crossed:

def FindValueIndex(seq, val):
    r = np.where(np.diff(np.sign(seq - val)) != 0)
    idx = r + (val - seq[r]) / (seq[r + np.ones_like(r)] - seq[r])
    idx = np.append(idx, np.where(seq == val))
    idx = np.sort(idx)
    return idx

The logic: Find where sign of seq - val is changing. Take the value one index below and above the transition and interpolate.Add to this index where the value is actually equals the value.

If you want an integer index just use np.round. You can also choose np.floor or np.ceil to round the index to your preference.

def FindValueIndex(seq, val):
    r = np.where(np.diff(np.sign(seq - val)) != 0)
    idx = r + (val - seq[r]) / (seq[r + np.ones_like(r)] - seq[r])
    idx = np.append(idx, np.where(seq == val))
    idx = np.sort(idx)
    return np.round(idx)

Find Nearest Indices for One Array Against All Values in Another Array - Python/Numpy