How do I use numpy.where()? What should I pass, and what does the result mean?
After fiddling around for a while, I figured things out, and am posting them here hoping it will help others.
Intuitively, np.where
is like asking "tell me where in this array, entries satisfy a given condition".
>>> a = np.arange(5,10)
>>> np.where(a < 8) # tell me where in a, entries are < 8
(array([0, 1, 2]),) # answer: entries indexed by 0, 1, 2
It can also be used to get entries in array that satisfy the condition:
>>> a[np.where(a < 8)]
array([5, 6, 7]) # selects from a entries 0, 1, 2
When a
is a 2d array, np.where()
returns an array of row idx's, and an array of col idx's:
>>> a = np.arange(4,10).reshape(2,3)
array([[4, 5, 6],
[7, 8, 9]])
>>> np.where(a > 8)
(array(1), array(2))
As in the 1d case, we can use np.where()
to get entries in the 2d array that satisfy the condition:
>>> a[np.where(a > 8)] # selects from a entries 0, 1, 2
array([9])
Note, when a
is 1d, np.where()
still returns an array of row idx's and an array of col idx's, but columns are of length 1, so latter is empty array.
np.where() solution explanation
np.where outputs a tuple (output of numpy.where(condition) is not an array, but a tuple of arrays: why?), so you'd have to index it (hence the first [0]), then, the output is a numpy array of elements. There is only one in this case, so the second [0] works. the tolist() is completely redundant though
It'd be better to extend list1 with the found indexes, because this code fails when an element occurs more than once:
list1 = []
[list1.extend(np.where(i == ser1)[0]) for i in ser2]
print(list1)
print()
Not the best code imo.
tip, just check the output of stuff yourself, and you would have figured this out. just run np.where(i==ser1)
and you'd have seen it returns a tuple, and you need to index it. etc.
How does python numpy.where() work?
How do they achieve internally that you are able to pass something like x > 5 into a method?
The short answer is that they don't.
Any sort of logical operation on a numpy array returns a boolean array. (i.e. __gt__
, __lt__
, etc all return boolean arrays where the given condition is true).
E.g.
x = np.arange(9).reshape(3,3)
print x > 5
yields:
array([[False, False, False],
[False, False, False],
[ True, True, True]], dtype=bool)
This is the same reason why something like if x > 5:
raises a ValueError if x
is a numpy array. It's an array of True/False values, not a single value.
Furthermore, numpy arrays can be indexed by boolean arrays. E.g. x[x>5]
yields [6 7 8]
, in this case.
Honestly, it's fairly rare that you actually need numpy.where
but it just returns the indicies where a boolean array is True
. Usually you can do what you need with simple boolean indexing.
Numpy where() on a 2D matrix
For the general case, where your search string can be in any column, you can do this:
>>> rows, cols = np.where(t == 'bar')
>>> t[rows]
array([['2', '3', '4', 'bar'],
['8', '9', '1', 'bar']],
dtype='|S11')
Inner workings of np.where() and how to check for emptiness/Noneness
Your test array:
In [57]: arr = np.array([4,5,6])
In [58]: arr
Out[58]: array([4, 5, 6])
the test produces a boolean array:
In [59]: arr>6
Out[59]: array([False, False, False])
searching for non-zeros, True
, in that array - there are none. As per docs, the result is a tuple, one array per dimension of the input:
In [60]: np.nonzero(arr>6)
Out[60]: (array([], dtype=int64),)
In [61]: _[0]
Out[61]: array([], dtype=int64)
Out[61].size
is 0. Out[61].shape
is (0,)
.
A more interesting threshhold:
In [62]: np.where(arr>4)
Out[62]: (array([1, 2]),)
In [63]: np.nonzero(arr>4)
Out[63]: (array([1, 2]),)
This tuple can be used directly to index the original array:
In [64]: arr[_]
Out[64]: array([5, 6])
Out[69]
is also a valid indexing tuple.
The tuple nature of the result becomes more interesting, and useful, when we work on a 2 or 3d array.
For example, multiples of 3 in a 2d array:
In [65]: arr = np.arange(12).reshape(3,4)
In [66]: arr
Out[66]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
In [67]: (arr % 3)==0
Out[67]:
array([[ True, False, False, True],
[False, False, True, False],
[False, True, False, False]])
In [68]: np.nonzero(_)
Out[68]: (array([0, 0, 1, 2]), array([0, 3, 2, 1]))
In [69]: arr[_]
Out[69]: array([0, 3, 6, 9])
Is there a numpy function that allows you to specify start, step, and number?
A deleted answer pointed out that linspace
takes an endpoint
parameter.
With that, 2 examples given in other answers can be written as:
In [955]: np.linspace(0, 0+(0.1*3),3,endpoint=False)
Out[955]: array([ 0. , 0.1, 0.2])
In [956]: np.linspace(0, 0+(5*3),3,endpoint=False)
Out[956]: array([ 0., 5., 10.])
In [957]: np.linspace(0, 0+(1.25*9),9,endpoint=False)
Out[957]: array([ 0. , 1.25, 2.5 , 3.75, 5. , 6.25, 7.5 , 8.75, 10. ])
Look at the functions defined in numpy.lib.index_tricks
for other ideas on how to generate ranges and/or grids. For example, np.ogrid[0:10:9j]
behaves like linspace
.
def altspace(start, step, count, endpoint=False, **kwargs):
stop = start+(step*count)
return np.linspace(start, stop, count, endpoint=endpoint, **kwargs)
Numpy: calculate based on previous element?
Lets build a few of the items in your sequence:
y[0] = 2*y[-1] + x[0]
y[1] = 2*y[0] + x[1] = 4*y[-1] + 2*x[0] + x[1]
y[2] = 2*y[1] + x[2] = 8*y[-1] + 4*x[0] + 2*x[1] + x[2]
...
y[n] = 2**(n+1)*y[-1] + 2**n*x[0] + 2**(n-1)*x[1] + ... + x[n]
It may not be immediately obvious, but you can build the above sequence with numpy doing something like:
n = len(x)
y_1 = 50
pot = 2**np.arange(n-1, -1, -1)
y = np.cumsum(pot * x) / pot + y_1 * 2**np.arange(1, n+1)
>>> y
array([ 101, 204, 411, 826, 1657, 3320, 6647, 13302, 26613, 53236])
The down side to this type of solutions is that they are not very general: a small change in your problem may render the whole approach useless. But whenever you can solve a problem with a little algebra, it is almost certainly going to beat any algorithmic approach by a far margin.
Related Topics
How to Plot Normal Distribution
How to Use Append with Pickle in Python
Django JSONfield Inside Arrayfield
Is Generator.Next() Visible in Python 3
How to Set Ticks on Fixed Position , Matplotlib
Iso to Datetime Object: 'Z' Is a Bad Directive
How to Read Unicode Input and Compare Unicode Strings in Python
How to Change Spacing Between Ticks in Matplotlib
Can Multiprocessing Process Class Be Run from Idle
How to Set Headers Using Python's Urllib
Python:When Is a Variable Passed by Reference and When by Value
Using Self.Xxxx as a Default Parameter - Python
How to Load/Edit/Run/Save Text Files (.Py) into an Ipython Notebook Cell
Get All Object Attributes in Python
How to Remove Duplicates from a CSV File