Find the row indexes of several values in a numpy array
Approach #1
One approach would be to use NumPy broadcasting
, like so -
np.where((X==searched_values[:,None]).all(-1))[1]
Approach #2
A memory efficient approach would be to convert each row as linear index equivalents and then using np.in1d
, like so -
dims = X.max(0)+1
out = np.where(np.in1d(np.ravel_multi_index(X.T,dims),\
np.ravel_multi_index(searched_values.T,dims)))[0]
Approach #3
Another memory efficient approach using np.searchsorted
and with that same philosophy of converting to linear index equivalents would be like so -
dims = X.max(0)+1
X1D = np.ravel_multi_index(X.T,dims)
searched_valuesID = np.ravel_multi_index(searched_values.T,dims)
sidx = X1D.argsort()
out = sidx[np.searchsorted(X1D,searched_valuesID,sorter=sidx)]
Please note that this np.searchsorted
method assumes there is a match for each row from searched_values
in X
.
How does np.ravel_multi_index
work?
This function gives us the linear index equivalent numbers. It accepts a 2D
array of n-dimensional indices
, set as columns and the shape of that n-dimensional grid itself onto which those indices are to be mapped and equivalent linear indices are to be computed.
Let's use the inputs we have for the problem at hand. Take the case of input X
and note the first row of it. Since, we are trying to convert each row of X
into its linear index equivalent and since np.ravel_multi_index
assumes each column as one indexing tuple, we need to transpose X
before feeding into the function. Since, the number of elements per row in X
in this case is 2
, the n-dimensional grid to be mapped onto would be 2D
. With 3 elements per row in X
, it would had been 3D
grid for mapping and so on.
To see how this function would compute linear indices, consider the first row of X
-
In [77]: X
Out[77]:
array([[4, 2],
[9, 3],
[8, 5],
[3, 3],
[5, 6]])
We have the shape of the n-dimensional grid as dims
-
In [78]: dims
Out[78]: array([10, 7])
Let's create the 2-dimensional grid to see how that mapping works and linear indices get computed with np.ravel_multi_index
-
In [79]: out = np.zeros(dims,dtype=int)
In [80]: out
Out[80]:
array([[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0]])
Let's set the first indexing tuple from X
, i.e. the first row from X
into the grid -
In [81]: out[4,2] = 1
In [82]: out
Out[82]:
array([[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 1, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0]])
Now, to see the linear index equivalent of the element just set, let's flatten and use np.where
to detect that 1
.
In [83]: np.where(out.ravel())[0]
Out[83]: array([30])
This could also be computed if row-major ordering is taken into account.
Let's use np.ravel_multi_index
and verify those linear indices -
In [84]: np.ravel_multi_index(X.T,dims)
Out[84]: array([30, 66, 61, 24, 41])
Thus, we would have linear indices corresponding to each indexing tuple from X
, i.e. each row from X
.
Choosing dimensions for np.ravel_multi_index
to form unique linear indices
Now, the idea behind considering each row of X
as indexing tuple of a n-dimensional grid and converting each such tuple to a scalar is to have unique scalars corresponding to unique tuples, i.e. unique rows in X
.
Let's take another look at X
-
In [77]: X
Out[77]:
array([[4, 2],
[9, 3],
[8, 5],
[3, 3],
[5, 6]])
Now, as discussed in the previous section, we are considering each row as indexing tuple. Within each such indexing tuple, the first element would represent the first axis of the n-dim grid, second element would be the second axis of the grid and so on until the last element of each row in X
. In essence, each column would represent one dimension or axis of the grid. If we are to map all elements from X
onto the same n-dim grid, we need to consider the maximum stretch of each axis of such a proposed n-dim grid. Assuming we are dealing with positive numbers in X
, such a stretch would be the maximum of each column in X
+ 1. That + 1
is because Python follows 0-based
indexing. So, for example X[1,0] == 9
would map to the 10th row of the proposed grid. Similarly, X[4,1] == 6
would go to the 7th
column of that grid.
So, for our sample case, we had -
In [7]: dims = X.max(axis=0) + 1 # Or simply X.max(0) + 1
In [8]: dims
Out[8]: array([10, 7])
Thus, we would need a grid of at least a shape of (10,7)
for our sample case. More lengths along the dimensions won't hurt and would give us unique linear indices too.
Concluding remarks : One important thing to be noted here is that if we have negative numbers in X
, we need to add proper offsets along each column in X
to make those indexing tuples as positive numbers before using np.ravel_multi_index
.
Getting the indices of several elements in a NumPy array at once
You could use in1d
and nonzero
(or where
for that matter):
>>> np.in1d(b, a).nonzero()[0]
array([0, 1, 4])
This works fine for your example arrays, but in general the array of returned indices does not honour the order of the values in a
. This may be a problem depending on what you want to do next.
In that case, a much better answer is the one @Jaime gives here, using searchsorted
:
>>> sorter = np.argsort(b)
>>> sorter[np.searchsorted(b, a, sorter=sorter)]
array([0, 1, 4])
This returns the indices for values as they appear in a
. For instance:
a = np.array([1, 2, 4])
b = np.array([4, 2, 3, 1])
>>> sorter = np.argsort(b)
>>> sorter[np.searchsorted(b, a, sorter=sorter)]
array([3, 1, 0]) # the other method would return [0, 1, 3]
Python Numpy find row index of array consisting of 5 values, by searching for 2 values
Here's your data:
import numpy as np
arr = np.array([[ 0, 0, 0, 255, 0],
[ 0, 1, 0, 0, 255],
[ 0, 2, 0, 255, 0]])
a,b = 0,2 # [a,b] is what we are looking for, in the first two cols
Here's the solution to get the row index containing [a,b]:
found_index = np.argmax(np.logical_and(arr[:,0]==[a],arr[:,1]==[b]))
print (found_index)
Output:
2
Explanation:
The best way to understand how this works, is by printing each part of it:
print (arr[:,0]==[a])
Outputs:
[ True True True]
print (arr[:,1]==[b])
Outputs:
[False False True]
print (np.logical_and(arr[:,0]==[a],arr[:,1]==[b]))
# print (np.logical_and([ True True True], [False False True]))
Outputs:
[False False True]
Find index of a row in numpy array
Just in case that the query array contains duplicate rows that you are looking for, the function below returns multiple indices in such case.
def find_rows(source, target):
return np.where((source == target).all(axis=1))[0]
looking = [10, 20, 30]
Y = np.array([[1, 2, 3],
[10, 20, 30],
[100, 200, 300],
[10, 20, 30]])
print(find_rows(source=Y, target=looking)) # [1, 3]
Numpy Array Get row index searching by a row
Why not simply do something like this?
>>> a
array([[ 0., 5., 2.],
[ 0., 0., 3.],
[ 0., 0., 0.]])
>>> b
array([ 0., 0., 3.])
>>> a==b
array([[ True, False, False],
[ True, True, True],
[ True, True, False]], dtype=bool)
>>> np.all(a==b,axis=1)
array([False, True, False], dtype=bool)
>>> np.where(np.all(a==b,axis=1))
(array([1]),)
Getting the row index for a 2D numPy array when multiple column values are known
Here are ways to handle conditions on columns or rows, inspired by the Zen of Python.
In []: import this
The Zen of Python, by Tim Peters
Beautiful is better than ugly.
Explicit is better than implicit.
...
So following the second advice:
a) conditions on column(s), applied to row(s):
In []: a= arange(12).reshape(3, 4)
In []: a
Out[]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
In []: a[2, logical_and(1== a[0, :], 5== a[1, :])]+= 12
In []: a
Out[]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 21, 10, 11]])
b) conditions on row(s), applied to column(s):
In []: a= a.T
In []: a
Out[]:
array([[ 0, 4, 8],
[ 1, 5, 21],
[ 2, 6, 10],
[ 3, 7, 11]])
In []: a[logical_and(1== a[:, 0], 5== a[:, 1]), 2]+= 12
In []: a
Out[]:
array([[ 0, 4, 8],
[ 1, 5, 33],
[ 2, 6, 10],
[ 3, 7, 11]])
So I hope this really makes sense to allways be explicit when accessing columns and rows. Code is typically read by people with various backgrounds.
How to find the indices of an ndarray in another ndarray
You could use np.argwhere as follows:
import numpy as np
a = np.array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8],
[9, 10, 11],
[12, 13, 14],
[15, 16, 17],
[18, 19, 20]])
b = np.array([[3, 4, 5],
[9, 10, 11]])
res = np.argwhere(
(a == b[:, None]) # compare all rows of a vs b
.all(axis=2) # find the ones where all the elements matches
)[:, 1] # use argwhere to find indices, but only use the column indices
print(res)
Output
[1 3]
UPDATE
For finding the missing ones do the following, I splitted the steps to make it easier to understand:
matches = (a == b[:, None]).all(axis=2)
print(matches)
res = np.argwhere(~matches.any(axis=1))[:, 0]
print(res)
Output
[[False True False False False False False]
[False False False False False False False]]
[1]
The first part of the output, shows two rows that correspond to the rows in b
, as it can be seen the first row of b
has a match in with the second row of a
. The second row of, has no matches.
The second part of the output shows the result of applying argwhere for selecting the indices where there is not row of a
matching one in b
(~matches.any(axis=1)
).
Related Topics
How to Fix: "Unicodedecodeerror: 'Ascii' Codec Can't Decode Byte"
Convert Python Dict into a Dataframe
Setting the Correct Encoding When Piping Stdout in Python
Are List-Comprehensions and Functional Functions Faster Than "For Loops"
How to Run a Python Script as a Service in Windows
How to Disable Python Warnings
Store Output of Subprocess.Popen Call in a String
How to Find Overlapping Matches With a Regexp
Difference Between Del, Remove, and Pop on Lists
Pandas Percentage of Total With Groupby
Convert Dataframe Column Type from String to Datetime
How to Find All Matches to a Regular Expression in Python
How to Get All Subsets of a Set - Powerset