Finding Indices of Matches of One Array in Another Array

Numpy: For every element in one array, find the index in another array

As Joe Kington said, searchsorted() can search element very quickly. To deal with elements that are not in x, you can check the searched result with original y, and create a masked array:

import numpy as np
x = np.array([3,5,7,1,9,8,6,6])
y = np.array([2,1,5,10,100,6])

index = np.argsort(x)
sorted_x = x[index]
sorted_index = np.searchsorted(sorted_x, y)

yindex = np.take(index, sorted_index, mode="clip")
mask = x[yindex] != y

result = np.ma.array(yindex, mask=mask)
print result

the result is:

[-- 3 1 -- -- 6]

Finding indices of matches of one array in another array

You can use np.in1d with np.nonzero -

np.nonzero(np.in1d(A,B))[0]

You can also use np.searchsorted, if you care about maintaining the order -

np.searchsorted(A,B)

For a generic case, when A & B are unsorted arrays, you can bring in the sorter option in np.searchsorted, like so -

sort_idx = A.argsort()
out = sort_idx[np.searchsorted(A,B,sorter = sort_idx)]

I would add in my favorite broadcasting too in the mix to solve a generic case -

np.nonzero(B[:,None] == A)[1]

Sample run -

In [125]: A
Out[125]: array([ 7, 5, 1, 6, 10, 9, 8])

In [126]: B
Out[126]: array([ 1, 10, 7])

In [127]: sort_idx = A.argsort()

In [128]: sort_idx[np.searchsorted(A,B,sorter = sort_idx)]
Out[128]: array([2, 4, 0])

In [129]: np.nonzero(B[:,None] == A)[1]
Out[129]: array([2, 4, 0])

Get indices of element of one array using indices in another array

I have a solution similar to that of Andras based on np.argmax and np.arange. Instead of "indexing the index" I propose to add a piecewise offset to the result of np.argmax:

import numpy as np
a = np.array([[[7, 9],
[19, 18]],
[[24, 5],
[18, 11]]])
off = np.arange(0, a.size, a.shape[2]).reshape(a.shape[0], a.shape[1])
>>> off
array([[0, 2],
[4, 6]])

This results in:

>>> a.argmax(-1) + off
array([[1, 2],
[4, 6]])

Or as a one-liner:

>>> a.argmax(-1) + np.arange(0, a.size, a.shape[2]).reshape(a.shape[0], a.shape[1])
array([[1, 2],
[4, 6]])

find array value that index matches another array values

Just map the tokenIds array:

const newstrings = tokenIds.map(i => strings[i]);

JavaScript - find index of array inside another array

To compare the actual values you can use the JSON.stringify() method:

piece = [5, 10];
array = [[5, 10], [5, 11]];

//Using findIndex
console.log(array.findIndex(function(element) {
return JSON.stringify(element) == JSON.stringify(piece);
}));

Get indices for values of one array in another array

Given ref, new which are shuffled versions of each other, we can get the unique indices that map ref to new using the sorted version of both arrays and the invertibility of np.argsort.

Start with:

i = np.argsort(ref)
j = np.argsort(new)

Now ref[i] and new[j] both give the sorted version of the arrays, which is the same for both. You can invert the first sort by doing:

k = np.argsort(i)

Now ref is just new[j][k], or new[j[k]]. Since all the operations are shuffles using unique indices, the final index j[k] is unique as well. j[k] can be computed in one step with

order = np.argsort(new)[np.argsort(np.argsort(ref))]

From your original example:

>>> ref = np.array([5, 3, 1, 2, 3, 4])
>>> new = np.array([3, 2, 4, 5, 3, 1])
>>> np.argsort(new)[np.argsort(np.argsort(ref))]
>>> order
array([3, 0, 5, 1, 4, 2])
>>> new[order] # Should give ref
array([5, 3, 1, 2, 3, 4])

This is probably not any faster than the more general solutions to the similar question on SO, but it does guarantee unique indices as you requested. A further optimization would be to to replace np.argsort(i) with something like the argsort_unique function in this answer. I would go one step further and just compute the inverse of the sort:

def inverse_argsort(a):
fwd = np.argsort(a)
inv = np.empty_like(fwd)
inv[fwd] = np.arange(fwd.size)
return inv

order = np.argsort(new)[inverse_argsort(ref)]

Find indices of numpy array based on values in another numpy array

Taking into account the proposed options on the comments, and adding an extra option with numpy's in1d option:

>>> import numpy as np
>>> summed_rows = np.random.randint(low=1, high=14, size=9999)
>>> common_sums = np.array([7,10,13])
>>> ind_1 = (summed_rows==common_sums[:,None]).any(0).nonzero()[0] # Option of @Brenlla
>>> ind_2 = np.where(summed_rows == common_sums[:, None])[1] # Option of @Ravi Sharma
>>> ind_3 = np.arange(summed_rows.shape[0])[np.in1d(summed_rows, common_sums)]
>>> ind_4 = np.where(np.in1d(summed_rows, common_sums))[0]
>>> ind_5 = np.where(np.isin(summed_rows, common_sums))[0] # Option of @jdehesa

>>> np.array_equal(np.sort(ind_1), np.sort(ind_2))
True
>>> np.array_equal(np.sort(ind_1), np.sort(ind_3))
True
>>> np.array_equal(np.sort(ind_1), np.sort(ind_4))
True
>>> np.array_equal(np.sort(ind_1), np.sort(ind_5))
True

If you time it, you can see that all of them are quite similar, but @Brenlla's option is the fastest one

python -m timeit -s 'import numpy as np; np.random.seed(0); a = np.random.randint(low=1, high=14, size=9999); b = np.array([7,10,13])' 'ind_1 = (a==b[:,None]).any(0).nonzero()[0]'
10000 loops, best of 3: 52.7 usec per loop

python -m timeit -s 'import numpy as np; np.random.seed(0); a = np.random.randint(low=1, high=14, size=9999); b = np.array([7,10,13])' 'ind_2 = np.where(a == b[:, None])[1]'
10000 loops, best of 3: 191 usec per loop

python -m timeit -s 'import numpy as np; np.random.seed(0); a = np.random.randint(low=1, high=14, size=9999); b = np.array([7,10,13])' 'ind_3 = np.arange(a.shape[0])[np.in1d(a, b)]'
10000 loops, best of 3: 103 usec per loop

python -m timeit -s 'import numpy as np; np.random.seed(0); a = np.random.randint(low=1, high=14, size=9999); b = np.array([7,10,13])' 'ind_4 = np.where(np.in1d(a, b))[0]'
10000 loops, best of 3: 63 usec per loo

python -m timeit -s 'import numpy as np; np.random.seed(0); a = np.random.randint(low=1, high=14, size=9999); b = np.array([7,10,13])' 'ind_5 = np.where(np.isin(a, b))[0]'
10000 loops, best of 3: 67.1 usec per loop

Find indices of rows of numpy 2d array in another 2D array

One way to achieve this.

If any row of arr1 were not found in arr2, then at that location in pos will have value -1 for simplicity.

This heavily uses numpy broadcasting and indexing. Feel free to ask for further clarifications.

Original example:

import numpy as np
arr1 = np.array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9],
[4, 5, 6],
[1, 2, 3]])
arr2 = np.array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])

inds = arr1 == arr2[:, None]
row_sums = inds.sum(axis = 2)
i, j = np.where(row_sums == 3) # Check which rows match in all 3 columns

pos = np.ones(arr1.shape[0], dtype = 'int64') * -1
pos[j] = i
pos
array([0, 1, 2, 1, 0])

Example 2:

import numpy as np
arr1 = np.array([[1, 2, 4],
[4, 5, 6],
[7, 8, 9],
[4, 1, 6],
[1, 2, 3]])
arr2 = np.array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])

inds = arr1 == arr2[:, None]
row_sums = inds.sum(axis = 2)
i, j = np.where(row_sums == 3)

pos = np.ones(arr1.shape[0], dtype = 'int64') * -1
pos[j] = i
pos
array([-1,  1,  2, -1,  0])

If you have more number of columns just change the line i, j = np.where(row_sums == 3) to i, j = np.where(row_sums == arr1.shape[1]).

Find indexes of array that match a condition, then push into another array

Matching Condition

This answer presumes that you're looking to get the indices of the elements which are of equal length to the element in the array with the maximum length.


Getting the Indices

To get the length of the longest element, we first get the lengths of each element, then reduce the array so as to get only the maximum length.

Next, we map each element of the array to one of two things; if the element is the same length as the maximum length, we take its index. If it isn't, then we map false instead. We do that so we can filter out those values.

Lastly, we filter out values which are false. These values were not as long as the maximum length, so we don't want them.


As T.J.Crowder alluded to, a forEach might do this in a way that's more direct, and potentially more readable. Experimenting with chaining array methods in this way will allow you to make a good decision as to the kind of code which will work best for the situation with which you are faced.

If you are working with very large data sets, then iterating over your data multiple times in order to improve code readability, at the expense of performance, is probably a bad idea.

const arr = ["00", "000", "", "0", "000"]
const maxLength = arr .map(e => e.length) .reduce((a, b) => a > b ? a : b) const result = arr .map((val, i) => val.length === maxLength ? i : false) // log the index of the item if the item's length is 1 .filter(i => i !== false) // remove non-truthy elements console.dir(result)


Related Topics



Leave a reply



Submit