Numpy: For every element in one array, find the index in another array
As Joe Kington said, searchsorted() can search element very quickly. To deal with elements that are not in x, you can check the searched result with original y, and create a masked array:
import numpy as np
x = np.array([3,5,7,1,9,8,6,6])
y = np.array([2,1,5,10,100,6])
index = np.argsort(x)
sorted_x = x[index]
sorted_index = np.searchsorted(sorted_x, y)
yindex = np.take(index, sorted_index, mode="clip")
mask = x[yindex] != y
result = np.ma.array(yindex, mask=mask)
print result
the result is:
[-- 3 1 -- -- 6]
Finding indices of matches of one array in another array
You can use np.in1d
with np.nonzero
-
np.nonzero(np.in1d(A,B))[0]
You can also use np.searchsorted
, if you care about maintaining the order -
np.searchsorted(A,B)
For a generic case, when A
& B
are unsorted arrays, you can bring in the sorter
option in np.searchsorted
, like so -
sort_idx = A.argsort()
out = sort_idx[np.searchsorted(A,B,sorter = sort_idx)]
I would add in my favorite broadcasting
too in the mix to solve a generic case -
np.nonzero(B[:,None] == A)[1]
Sample run -
In [125]: A
Out[125]: array([ 7, 5, 1, 6, 10, 9, 8])
In [126]: B
Out[126]: array([ 1, 10, 7])
In [127]: sort_idx = A.argsort()
In [128]: sort_idx[np.searchsorted(A,B,sorter = sort_idx)]
Out[128]: array([2, 4, 0])
In [129]: np.nonzero(B[:,None] == A)[1]
Out[129]: array([2, 4, 0])
Get indices of element of one array using indices in another array
I have a solution similar to that of Andras based on np.argmax and np.arange. Instead of "indexing the index" I propose to add a piecewise offset to the result of np.argmax:
import numpy as np
a = np.array([[[7, 9],
[19, 18]],
[[24, 5],
[18, 11]]])
off = np.arange(0, a.size, a.shape[2]).reshape(a.shape[0], a.shape[1])
>>> off
array([[0, 2],
[4, 6]])
This results in:
>>> a.argmax(-1) + off
array([[1, 2],
[4, 6]])
Or as a one-liner:
>>> a.argmax(-1) + np.arange(0, a.size, a.shape[2]).reshape(a.shape[0], a.shape[1])
array([[1, 2],
[4, 6]])
find array value that index matches another array values
Just map
the tokenIds
array:
const newstrings = tokenIds.map(i => strings[i]);
JavaScript - find index of array inside another array
To compare the actual values you can use the JSON.stringify() method:
piece = [5, 10];
array = [[5, 10], [5, 11]];
//Using findIndex
console.log(array.findIndex(function(element) {
return JSON.stringify(element) == JSON.stringify(piece);
}));
Get indices for values of one array in another array
Given ref
, new
which are shuffled versions of each other, we can get the unique indices that map ref
to new
using the sorted version of both arrays and the invertibility of np.argsort
.
Start with:
i = np.argsort(ref)
j = np.argsort(new)
Now ref[i]
and new[j]
both give the sorted version of the arrays, which is the same for both. You can invert the first sort by doing:
k = np.argsort(i)
Now ref
is just new[j][k]
, or new[j[k]]
. Since all the operations are shuffles using unique indices, the final index j[k]
is unique as well. j[k]
can be computed in one step with
order = np.argsort(new)[np.argsort(np.argsort(ref))]
From your original example:
>>> ref = np.array([5, 3, 1, 2, 3, 4])
>>> new = np.array([3, 2, 4, 5, 3, 1])
>>> np.argsort(new)[np.argsort(np.argsort(ref))]
>>> order
array([3, 0, 5, 1, 4, 2])
>>> new[order] # Should give ref
array([5, 3, 1, 2, 3, 4])
This is probably not any faster than the more general solutions to the similar question on SO, but it does guarantee unique indices as you requested. A further optimization would be to to replace np.argsort(i)
with something like the argsort_unique
function in this answer. I would go one step further and just compute the inverse of the sort:
def inverse_argsort(a):
fwd = np.argsort(a)
inv = np.empty_like(fwd)
inv[fwd] = np.arange(fwd.size)
return inv
order = np.argsort(new)[inverse_argsort(ref)]
Find indices of numpy array based on values in another numpy array
Taking into account the proposed options on the comments, and adding an extra option with numpy's in1d option:
>>> import numpy as np
>>> summed_rows = np.random.randint(low=1, high=14, size=9999)
>>> common_sums = np.array([7,10,13])
>>> ind_1 = (summed_rows==common_sums[:,None]).any(0).nonzero()[0] # Option of @Brenlla
>>> ind_2 = np.where(summed_rows == common_sums[:, None])[1] # Option of @Ravi Sharma
>>> ind_3 = np.arange(summed_rows.shape[0])[np.in1d(summed_rows, common_sums)]
>>> ind_4 = np.where(np.in1d(summed_rows, common_sums))[0]
>>> ind_5 = np.where(np.isin(summed_rows, common_sums))[0] # Option of @jdehesa
>>> np.array_equal(np.sort(ind_1), np.sort(ind_2))
True
>>> np.array_equal(np.sort(ind_1), np.sort(ind_3))
True
>>> np.array_equal(np.sort(ind_1), np.sort(ind_4))
True
>>> np.array_equal(np.sort(ind_1), np.sort(ind_5))
True
If you time it, you can see that all of them are quite similar, but @Brenlla's option is the fastest one
python -m timeit -s 'import numpy as np; np.random.seed(0); a = np.random.randint(low=1, high=14, size=9999); b = np.array([7,10,13])' 'ind_1 = (a==b[:,None]).any(0).nonzero()[0]'
10000 loops, best of 3: 52.7 usec per loop
python -m timeit -s 'import numpy as np; np.random.seed(0); a = np.random.randint(low=1, high=14, size=9999); b = np.array([7,10,13])' 'ind_2 = np.where(a == b[:, None])[1]'
10000 loops, best of 3: 191 usec per loop
python -m timeit -s 'import numpy as np; np.random.seed(0); a = np.random.randint(low=1, high=14, size=9999); b = np.array([7,10,13])' 'ind_3 = np.arange(a.shape[0])[np.in1d(a, b)]'
10000 loops, best of 3: 103 usec per loop
python -m timeit -s 'import numpy as np; np.random.seed(0); a = np.random.randint(low=1, high=14, size=9999); b = np.array([7,10,13])' 'ind_4 = np.where(np.in1d(a, b))[0]'
10000 loops, best of 3: 63 usec per loo
python -m timeit -s 'import numpy as np; np.random.seed(0); a = np.random.randint(low=1, high=14, size=9999); b = np.array([7,10,13])' 'ind_5 = np.where(np.isin(a, b))[0]'
10000 loops, best of 3: 67.1 usec per loop
Find indices of rows of numpy 2d array in another 2D array
One way to achieve this.
If any row of arr1
were not found in arr2
, then at that location in pos
will have value -1
for simplicity.
This heavily uses numpy broadcasting and indexing. Feel free to ask for further clarifications.
Original example:
import numpy as np
arr1 = np.array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9],
[4, 5, 6],
[1, 2, 3]])
arr2 = np.array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
inds = arr1 == arr2[:, None]
row_sums = inds.sum(axis = 2)
i, j = np.where(row_sums == 3) # Check which rows match in all 3 columns
pos = np.ones(arr1.shape[0], dtype = 'int64') * -1
pos[j] = i
pos
array([0, 1, 2, 1, 0])
Example 2:
import numpy as np
arr1 = np.array([[1, 2, 4],
[4, 5, 6],
[7, 8, 9],
[4, 1, 6],
[1, 2, 3]])
arr2 = np.array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
inds = arr1 == arr2[:, None]
row_sums = inds.sum(axis = 2)
i, j = np.where(row_sums == 3)
pos = np.ones(arr1.shape[0], dtype = 'int64') * -1
pos[j] = i
pos
array([-1, 1, 2, -1, 0])
If you have more number of columns just change the line i, j = np.where(row_sums == 3)
to i, j = np.where(row_sums == arr1.shape[1])
.
Find indexes of array that match a condition, then push into another array
Matching Condition
This answer presumes that you're looking to get the indices of the elements which are of equal length to the element in the array with the maximum length.
Getting the Indices
To get the length of the longest element, we first get the lengths of each element, then reduce the array so as to get only the maximum length.
Next, we map each element of the array to one of two things; if the element is the same length as the maximum length, we take its index. If it isn't, then we map false
instead. We do that so we can filter out those values.
Lastly, we filter out values which are false. These values were not as long as the maximum length, so we don't want them.
As T.J.Crowder alluded to, a forEach
might do this in a way that's more direct, and potentially more readable. Experimenting with chaining array methods in this way will allow you to make a good decision as to the kind of code which will work best for the situation with which you are faced.
If you are working with very large data sets, then iterating over your data multiple times in order to improve code readability, at the expense of performance, is probably a bad idea.
const arr = ["00", "000", "", "0", "000"]
const maxLength = arr .map(e => e.length) .reduce((a, b) => a > b ? a : b) const result = arr .map((val, i) => val.length === maxLength ? i : false) // log the index of the item if the item's length is 1 .filter(i => i !== false) // remove non-truthy elements console.dir(result)
Related Topics
"Importerror: No Module Named Site" on Windows
Purpose of "%Matplotlib Inline"
Finding Median of List in Python
How to Print Variables Without Spaces Between Values
Python Round Up Integer to Next Hundred
Making a Request to a Restful API Using Python
How Are Glob.Glob()'s Return Values Ordered
How to Overload Python Assignment
How to Get the Different Parts of a Flask Request's Url
Why Does Python Code Use Len() Function Instead of a Length Method
How to Create Multiline Comments in Python
How to Sandbox Python in Pure Python
Replace() Method Not Working on Pandas Dataframe
Merging Several Python Dictionaries