Numpy.Where() Detailed, Step-By-Step Explanation/Examples

How do I use numpy.where()? What should I pass, and what does the result mean?

After fiddling around for a while, I figured things out, and am posting them here hoping it will help others.

Intuitively, np.where is like asking "tell me where in this array, entries satisfy a given condition".

>>> a = np.arange(5,10)
>>> np.where(a < 8) # tell me where in a, entries are < 8
(array([0, 1, 2]),) # answer: entries indexed by 0, 1, 2

It can also be used to get entries in array that satisfy the condition:

>>> a[np.where(a < 8)] 
array([5, 6, 7]) # selects from a entries 0, 1, 2

When a is a 2d array, np.where() returns an array of row idx's, and an array of col idx's:

>>> a = np.arange(4,10).reshape(2,3)
array([[4, 5, 6],
[7, 8, 9]])
>>> np.where(a > 8)
(array(1), array(2))

As in the 1d case, we can use np.where() to get entries in the 2d array that satisfy the condition:

>>> a[np.where(a > 8)] # selects from a entries 0, 1, 2

array([9])


Note, when a is 1d, np.where() still returns an array of row idx's and an array of col idx's, but columns are of length 1, so latter is empty array.

np.where() solution explanation

np.where outputs a tuple (output of numpy.where(condition) is not an array, but a tuple of arrays: why?), so you'd have to index it (hence the first [0]), then, the output is a numpy array of elements. There is only one in this case, so the second [0] works. the tolist() is completely redundant though

It'd be better to extend list1 with the found indexes, because this code fails when an element occurs more than once:

list1 = []
[list1.extend(np.where(i == ser1)[0]) for i in ser2]
print(list1)
print()

Not the best code imo.

tip, just check the output of stuff yourself, and you would have figured this out. just run np.where(i==ser1) and you'd have seen it returns a tuple, and you need to index it. etc.

How does python numpy.where() work?

How do they achieve internally that you are able to pass something like x > 5 into a method?

The short answer is that they don't.

Any sort of logical operation on a numpy array returns a boolean array. (i.e. __gt__, __lt__, etc all return boolean arrays where the given condition is true).

E.g.

x = np.arange(9).reshape(3,3)
print x > 5

yields:

array([[False, False, False],
[False, False, False],
[ True, True, True]], dtype=bool)

This is the same reason why something like if x > 5: raises a ValueError if x is a numpy array. It's an array of True/False values, not a single value.

Furthermore, numpy arrays can be indexed by boolean arrays. E.g. x[x>5] yields [6 7 8], in this case.

Honestly, it's fairly rare that you actually need numpy.where but it just returns the indicies where a boolean array is True. Usually you can do what you need with simple boolean indexing.

Numpy where() on a 2D matrix

For the general case, where your search string can be in any column, you can do this:

>>> rows, cols = np.where(t == 'bar')
>>> t[rows]
array([['2', '3', '4', 'bar'],
['8', '9', '1', 'bar']],
dtype='|S11')

Inner workings of np.where() and how to check for emptiness/Noneness

Your test array:

In [57]: arr = np.array([4,5,6])                                                                             
In [58]: arr
Out[58]: array([4, 5, 6])

the test produces a boolean array:

In [59]: arr>6                                                                                               
Out[59]: array([False, False, False])

searching for non-zeros, True, in that array - there are none. As per docs, the result is a tuple, one array per dimension of the input:

In [60]: np.nonzero(arr>6)                                                                                   
Out[60]: (array([], dtype=int64),)
In [61]: _[0]
Out[61]: array([], dtype=int64)

Out[61].size is 0. Out[61].shape is (0,).

A more interesting threshhold:

In [62]: np.where(arr>4)                                                                                     
Out[62]: (array([1, 2]),)
In [63]: np.nonzero(arr>4)
Out[63]: (array([1, 2]),)

This tuple can be used directly to index the original array:

In [64]: arr[_]                                                                                              
Out[64]: array([5, 6])

Out[69] is also a valid indexing tuple.

The tuple nature of the result becomes more interesting, and useful, when we work on a 2 or 3d array.

For example, multiples of 3 in a 2d array:

In [65]: arr = np.arange(12).reshape(3,4)                                                                    
In [66]: arr
Out[66]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
In [67]: (arr % 3)==0
Out[67]:
array([[ True, False, False, True],
[False, False, True, False],
[False, True, False, False]])
In [68]: np.nonzero(_)
Out[68]: (array([0, 0, 1, 2]), array([0, 3, 2, 1]))
In [69]: arr[_]
Out[69]: array([0, 3, 6, 9])

Is there a numpy function that allows you to specify start, step, and number?

A deleted answer pointed out that linspace takes an endpoint parameter.

With that, 2 examples given in other answers can be written as:

In [955]: np.linspace(0, 0+(0.1*3),3,endpoint=False)
Out[955]: array([ 0. , 0.1, 0.2])

In [956]: np.linspace(0, 0+(5*3),3,endpoint=False)
Out[956]: array([ 0., 5., 10.])

In [957]: np.linspace(0, 0+(1.25*9),9,endpoint=False)
Out[957]: array([ 0. , 1.25, 2.5 , 3.75, 5. , 6.25, 7.5 , 8.75, 10. ])

Look at the functions defined in numpy.lib.index_tricks for other ideas on how to generate ranges and/or grids. For example, np.ogrid[0:10:9j] behaves like linspace.

def altspace(start, step, count, endpoint=False, **kwargs):
stop = start+(step*count)
return np.linspace(start, stop, count, endpoint=endpoint, **kwargs)

Numpy: calculate based on previous element?

Lets build a few of the items in your sequence:

y[0] = 2*y[-1] + x[0]
y[1] = 2*y[0] + x[1] = 4*y[-1] + 2*x[0] + x[1]
y[2] = 2*y[1] + x[2] = 8*y[-1] + 4*x[0] + 2*x[1] + x[2]
...
y[n] = 2**(n+1)*y[-1] + 2**n*x[0] + 2**(n-1)*x[1] + ... + x[n]

It may not be immediately obvious, but you can build the above sequence with numpy doing something like:

n = len(x)
y_1 = 50
pot = 2**np.arange(n-1, -1, -1)
y = np.cumsum(pot * x) / pot + y_1 * 2**np.arange(1, n+1)
>>> y
array([ 101, 204, 411, 826, 1657, 3320, 6647, 13302, 26613, 53236])

The down side to this type of solutions is that they are not very general: a small change in your problem may render the whole approach useless. But whenever you can solve a problem with a little algebra, it is almost certainly going to beat any algorithmic approach by a far margin.



Related Topics



Leave a reply



Submit