Find nearest indices for one array against all values in another array - Python / NumPy
Here's one vectorized approach with np.searchsorted
based on this post
-
def closest_argmin(A, B):
L = B.size
sidx_B = B.argsort()
sorted_B = B[sidx_B]
sorted_idx = np.searchsorted(sorted_B, A)
sorted_idx[sorted_idx==L] = L-1
mask = (sorted_idx > 0) & \
((np.abs(A - sorted_B[sorted_idx-1]) < np.abs(A - sorted_B[sorted_idx])) )
return sidx_B[sorted_idx-mask]
Brief explanation :
Get the sorted indices for the left positions. We do this with -
np.searchsorted(arr1, arr2, side='left')
or justnp.searchsorted(arr1, arr2)
. Now,searchsorted
expects sorted array as the first input, so we need some preparatory work there.Compare the values at those left positions with the values at their immediate right positions
(left + 1)
and see which one is closest. We do this at the step that computesmask
.Based on whether the left ones or their immediate right ones are closest, choose the respective ones. This is done with the subtraction of indices with the
mask
values acting as the offsets being converted toints
.
Benchmarking
Original approach -
def org_app(myArray, refArray):
out1 = np.empty(myArray.size, dtype=int)
for i, value in enumerate(myArray):
# find_nearest from posted question
index = find_nearest(refArray, value)
out1[i] = index
return out1
Timings and verification -
In [188]: refArray = np.random.random(16)
...: myArray = np.random.random(1000)
...:
In [189]: %timeit org_app(myArray, refArray)
100 loops, best of 3: 1.95 ms per loop
In [190]: %timeit closest_argmin(myArray, refArray)
10000 loops, best of 3: 36.6 µs per loop
In [191]: np.allclose(closest_argmin(myArray, refArray), org_app(myArray, refArray))
Out[191]: True
50x+
speedup for the posted sample and hopefully more for larger datasets!
Find nearest value in numpy array
import numpy as np
def find_nearest(array, value):
array = np.asarray(array)
idx = (np.abs(array - value)).argmin()
return array[idx]
Example usage:
array = np.random.random(10)
print(array)
# [ 0.21069679 0.61290182 0.63425412 0.84635244 0.91599191 0.00213826
# 0.17104965 0.56874386 0.57319379 0.28719469]
print(find_nearest(array, value=0.5))
# 0.568743859261
Find indices of numpy array based on values in another numpy array
Taking into account the proposed options on the comments, and adding an extra option with numpy's in1d option:
>>> import numpy as np
>>> summed_rows = np.random.randint(low=1, high=14, size=9999)
>>> common_sums = np.array([7,10,13])
>>> ind_1 = (summed_rows==common_sums[:,None]).any(0).nonzero()[0] # Option of @Brenlla
>>> ind_2 = np.where(summed_rows == common_sums[:, None])[1] # Option of @Ravi Sharma
>>> ind_3 = np.arange(summed_rows.shape[0])[np.in1d(summed_rows, common_sums)]
>>> ind_4 = np.where(np.in1d(summed_rows, common_sums))[0]
>>> ind_5 = np.where(np.isin(summed_rows, common_sums))[0] # Option of @jdehesa
>>> np.array_equal(np.sort(ind_1), np.sort(ind_2))
True
>>> np.array_equal(np.sort(ind_1), np.sort(ind_3))
True
>>> np.array_equal(np.sort(ind_1), np.sort(ind_4))
True
>>> np.array_equal(np.sort(ind_1), np.sort(ind_5))
True
If you time it, you can see that all of them are quite similar, but @Brenlla's option is the fastest one
python -m timeit -s 'import numpy as np; np.random.seed(0); a = np.random.randint(low=1, high=14, size=9999); b = np.array([7,10,13])' 'ind_1 = (a==b[:,None]).any(0).nonzero()[0]'
10000 loops, best of 3: 52.7 usec per loop
python -m timeit -s 'import numpy as np; np.random.seed(0); a = np.random.randint(low=1, high=14, size=9999); b = np.array([7,10,13])' 'ind_2 = np.where(a == b[:, None])[1]'
10000 loops, best of 3: 191 usec per loop
python -m timeit -s 'import numpy as np; np.random.seed(0); a = np.random.randint(low=1, high=14, size=9999); b = np.array([7,10,13])' 'ind_3 = np.arange(a.shape[0])[np.in1d(a, b)]'
10000 loops, best of 3: 103 usec per loop
python -m timeit -s 'import numpy as np; np.random.seed(0); a = np.random.randint(low=1, high=14, size=9999); b = np.array([7,10,13])' 'ind_4 = np.where(np.in1d(a, b))[0]'
10000 loops, best of 3: 63 usec per loo
python -m timeit -s 'import numpy as np; np.random.seed(0); a = np.random.randint(low=1, high=14, size=9999); b = np.array([7,10,13])' 'ind_5 = np.where(np.isin(a, b))[0]'
10000 loops, best of 3: 67.1 usec per loop
Vectorize finding closest value in an array for each element in another array
For example, you can compute all the differences in on go with:
differences = (test_array.reshape(1,-1) - known_array.reshape(-1,1))
And use argmin
and fancy indexing along with np.diagonal
to get desired indices and differences:
indices = np.abs(differences).argmin(axis=0)
residual = np.diagonal(differences[indices,])
So for
>>> known_array = np.array([-24, -18, -13, -30, 29])
>>> test_array = np.array([-6, 4, -6, 4, 8, -4, 8, -6, 2, 8])
One get
>>> indices
array([2, 2, 2, 2, 2, 2, 2, 2, 2, 2])
>>> residual
array([ 7, 17, 7, 17, 21, 9, 21, 7, 15, 21])
Finding nearest element of an array in a particular column of another array
Looking up lots of items in an array is easiest done when it is sorted. You can delegate most of the work to np.searchsorted
. Since we want to find elements in arr_1
, it is the only array that needs to be sorted. I suspect that having a sorted arr_2
will speed things up by reducing the size of the search space for every successive element.
First, find the insertion points where arr_2
would end up in arr_1
:
indices = np.searchsorted(arr_1[:, 1], arr_2[:, 1])
Now all you have to do is check for cases where the prior element is closer than the current one. There are two corner cases: when index is 0, you have to accept it, and when it is arr_1.size
, you have to take the prior.
indices[indices == arr_1.shape[0]] = arr_1.shape[0] - 1
indices[(indices != 0) & (arr_1[indices, 1] - arr_2[:, 1] > arr_2[:, 1] - arr_1[indices - 1, 1])] -= 1
Doing it in this order saves you the trouble of messing with temporary arrays. The first line ensures that the index arr_1[indices, 1]
is always valid. Since index -1 is valid, the second line succeeds as well.
The final result is then
np.concatenate((arr_2, arr_1[indices, 0:1]), axis=1)
If arr_1
is not already sorted, you can do the following:
arr_1 = arr1[np.argsort(arr_1[:, 1]), :]
A quick benchmark shows that on my very moderately powered machine, this approach takes ~300ms for arr_1.shape = (500000, 2)
and arr_2.shape = (300000, 2)
.
Numpy find identical elements from list of arrays and another array
Assuming a
is a list of arrays, you can use broadcasting to perform the comparisons of all elements:
out = [x[(x == b[:,None]).all(2).any(0)] for x in a]
Output:
[array([[3, 4, 5]]),
array([[5, 5, 5],
[9, 3, 3]])]
Indices:
[np.where((x == b[:,None]).all(2).any(0))[0] for x in a]
Output:
[array([2]), array([2, 3])]
Numpy: For every element in one array, find the index in another array
As Joe Kington said, searchsorted() can search element very quickly. To deal with elements that are not in x, you can check the searched result with original y, and create a masked array:
import numpy as np
x = np.array([3,5,7,1,9,8,6,6])
y = np.array([2,1,5,10,100,6])
index = np.argsort(x)
sorted_x = x[index]
sorted_index = np.searchsorted(sorted_x, y)
yindex = np.take(index, sorted_index, mode="clip")
mask = x[yindex] != y
result = np.ma.array(yindex, mask=mask)
print result
the result is:
[-- 3 1 -- -- 6]
Replace elements in numpy array with closest value in another array
You have to calculate the difference between each element in aa
and bb
, and take the minimum:
aa_nearest = bb[abs(aa[None, :] - bb[:, None]).argmin(axis=0)]
Find all indexes of a numpy array closest to a value
The following function will return a fractional index showing approximately when the value is crossed:
def FindValueIndex(seq, val):
r = np.where(np.diff(np.sign(seq - val)) != 0)
idx = r + (val - seq[r]) / (seq[r + np.ones_like(r)] - seq[r])
idx = np.append(idx, np.where(seq == val))
idx = np.sort(idx)
return idx
The logic: Find where sign of seq - val is changing. Take the value one index below and above the transition and interpolate.Add to this index where the value is actually equals the value.
If you want an integer index just use np.round. You can also choose np.floor or np.ceil to round the index to your preference.
def FindValueIndex(seq, val):
r = np.where(np.diff(np.sign(seq - val)) != 0)
idx = r + (val - seq[r]) / (seq[r + np.ones_like(r)] - seq[r])
idx = np.append(idx, np.where(seq == val))
idx = np.sort(idx)
return np.round(idx)
Related Topics
How to Remove Item from a Python List in a Loop
How to Bind the Enter Key to a Function in Tkinter
Extract Email Sub-Strings from Large Document
How Can Pyspark Be Called in Debug Mode
How to Stack Vectors of Different Lengths in Numpy
How to Apply Itertools.Product to Elements of a List of Lists
Add an Image in a Specific Position in the Document (.Docx)
Printing Utf-8 in Python 3 Using Sublime Text 3
Python 2 CSV Writer Produces Wrong Line Terminator on Windows
How to Write String Literals in Python Without Having to Escape Them
How to Get 'Real-Time' Information Back from a Subprocess.Popen in Python (2.5)
How to Invoke Pandas.Rolling.Apply with Parameters from Multiple Column
How to Rotate Xticklabels in Matplotlib So That the Spacing Between Each Xticklabel Is Equal