Transform a set of numbers in numpy so that each number gets converted into a number of other numbers which are less than it
What you actually need to do is get the inverse of the sorting order of your array:
import numpy as np
x = np.random.rand(10)
y = np.empty(x.size,dtype=np.int64)
y[x.argsort()] = np.arange(x.size)
Example run (in ipython):
In [367]: x
Out[367]:
array([ 0.09139335, 0.29084225, 0.43560987, 0.92334644, 0.09868977,
0.90202354, 0.80905083, 0.4801967 , 0.99086213, 0.00933582])
In [368]: y
Out[368]: array([1, 3, 4, 8, 2, 7, 6, 5, 9, 0])
Alternatively, if you want to get the number of elements greater than each corresponding element in x
, you have to reverse the sorting from ascending to descending. One possible option to do this is to simply swap the construction of the indexing:
y_rev = np.empty(x.size,dtype=np.int64)
y_rev[x.argsort()] = np.arange(x.size)[::-1]
another, as @unutbu suggested in a comment, is to map the original array to the new one:
y_rev = x.size - y - 1
Replace all elements of Python NumPy Array that are greater than some value
I think both the fastest and most concise way to do this is to use NumPy's built-in Fancy indexing. If you have an ndarray
named arr
, you can replace all elements >255
with a value x
as follows:
arr[arr > 255] = x
I ran this on my machine with a 500 x 500 random matrix, replacing all values >0.5 with 5, and it took an average of 7.59ms.
In [1]: import numpy as np
In [2]: A = np.random.rand(500, 500)
In [3]: timeit A[A > 0.5] = 5
100 loops, best of 3: 7.59 ms per loop
Numpy: change max in each row to 1, all other numbers to 0
Method #1, tweaking yours:
>>> a = np.array([[0, 1], [2, 3], [4, 5], [6, 7], [9, 8]])
>>> b = np.zeros_like(a)
>>> b[np.arange(len(a)), a.argmax(1)] = 1
>>> b
array([[0, 1],
[0, 1],
[0, 1],
[0, 1],
[1, 0]])
[Actually, range
will work just fine; I wrote arange
out of habit.]
Method #2, using max
instead of argmax
to handle the case where multiple elements reach the maximum value:
>>> a = np.array([[0, 1], [2, 2], [4, 3]])
>>> (a == a.max(axis=1)[:,None]).astype(int)
array([[0, 1],
[1, 1],
[1, 0]])
Replacing Numpy elements if condition is met
>>> import numpy as np
>>> a = np.random.randint(0, 5, size=(5, 4))
>>> a
array([[4, 2, 1, 1],
[3, 0, 1, 2],
[2, 0, 1, 1],
[4, 0, 2, 3],
[0, 0, 0, 2]])
>>> b = a < 3
>>> b
array([[False, True, True, True],
[False, True, True, True],
[ True, True, True, True],
[False, True, True, False],
[ True, True, True, True]], dtype=bool)
>>>
>>> c = b.astype(int)
>>> c
array([[0, 1, 1, 1],
[0, 1, 1, 1],
[1, 1, 1, 1],
[0, 1, 1, 0],
[1, 1, 1, 1]])
You can shorten this with:
>>> c = (a < 3).astype(int)
Getting the indices of several rows in a NumPy array at once
Given A
and B
, you can generate C
using
In [25]: (B[:,None,:] == A).all(axis=-1).argmax(axis=0)
Out[25]: array([0, 1, 0, 2, 1, 2, 2])
Note that this assumes that every row of B
is in A
. (Otherwise, argmax
could return bogus indices where the equality is False.)
Note that if you had NumPy version 1.13 or newer, then
you could use np.unique
to generate both B
and C
at the same time:
In [33]: np.unique(A, axis=0, return_inverse=True)
Out[33]:
(array([[1, 2, 3],
[2, 2, 2],
[2, 3, 3]]), array([0, 1, 0, 2, 1, 2, 2]))
Note that Divakar's solution (using np.void
) is far faster, particularly if A
has many rows:
A = np.random.randint(10, size=(1000, 3))
B, C = np.unique(A, axis=0, return_inverse=True)
In [44]: %%timeit
....: A1D, B1D = view1D(A, B)
....: sidx = B1D.argsort()
....: out = argsort_unique(sidx)[np.searchsorted(B1D, A1D, sorter=sidx)]
....:
1000 loops, best of 3: 271 µs per loop
In [45]: %timeit (B[:,None,:] == A).all(axis=-1).argmax(axis=0)
100 loops, best of 3: 15.5 ms per loop
Set numpy array elements to zero if they are above a specific threshold
Generally, list comprehensions are faster than for
loops in python (because python knows that it doesn't need to care for a lot of things that might happen in a regular for
loop):
a = [0 if a_ > thresh else a_ for a_ in a]
but, as @unutbu correctly pointed out, numpy allows list indexing, and element-wise comparison giving you index lists, so:
super_threshold_indices = a > thresh
a[super_threshold_indices] = 0
would be even faster.
Generally, when applying methods on vectors of data, have a look at numpy.ufuncs
, which often perform much better than python functions that you map using any native mechanism.
Related Topics
Matplotlib Scatterplot with Legend
Python Mixed Integer Linear Programming
Datetime Timezone Conversion Using Pytz
Editing Workbooks with Rich Text in Openpyxl
Python Opencv Line Detection to Detect 'X' Symbol in Image
Why Does Python's _Import_ Require Fromlist
Check If String Is in a Pandas Dataframe
Generate a List of Datetimes Between an Interval
How to Write String Literals in Python Without Having to Escape Them
How to Find Duplicate Elements in Array Using for Loop in Python
Is There a Simple Way to Change a Column of Yes/No to 1/0 in a Pandas Dataframe
Web Scraping Program Cannot Find Element Which I Can See in the Browser
How to Break Up This Long Line in Python