Transform a Set of Numbers in Numpy So That Each Number Gets Converted into a Number of Other Numbers Which Are Less Than It

Transform a set of numbers in numpy so that each number gets converted into a number of other numbers which are less than it

What you actually need to do is get the inverse of the sorting order of your array:

import numpy as np
x = np.random.rand(10)
y = np.empty(x.size,dtype=np.int64)
y[x.argsort()] = np.arange(x.size)

Example run (in ipython):

In [367]: x
Out[367]:
array([ 0.09139335, 0.29084225, 0.43560987, 0.92334644, 0.09868977,
0.90202354, 0.80905083, 0.4801967 , 0.99086213, 0.00933582])

In [368]: y
Out[368]: array([1, 3, 4, 8, 2, 7, 6, 5, 9, 0])

Alternatively, if you want to get the number of elements greater than each corresponding element in x, you have to reverse the sorting from ascending to descending. One possible option to do this is to simply swap the construction of the indexing:

y_rev = np.empty(x.size,dtype=np.int64)
y_rev[x.argsort()] = np.arange(x.size)[::-1]

another, as @unutbu suggested in a comment, is to map the original array to the new one:

y_rev = x.size - y - 1

Replace all elements of Python NumPy Array that are greater than some value

I think both the fastest and most concise way to do this is to use NumPy's built-in Fancy indexing. If you have an ndarray named arr, you can replace all elements >255 with a value x as follows:

arr[arr > 255] = x

I ran this on my machine with a 500 x 500 random matrix, replacing all values >0.5 with 5, and it took an average of 7.59ms.

In [1]: import numpy as np
In [2]: A = np.random.rand(500, 500)
In [3]: timeit A[A > 0.5] = 5
100 loops, best of 3: 7.59 ms per loop

Numpy: change max in each row to 1, all other numbers to 0

Method #1, tweaking yours:

>>> a = np.array([[0, 1], [2, 3], [4, 5], [6, 7], [9, 8]])
>>> b = np.zeros_like(a)
>>> b[np.arange(len(a)), a.argmax(1)] = 1
>>> b
array([[0, 1],
[0, 1],
[0, 1],
[0, 1],
[1, 0]])

[Actually, range will work just fine; I wrote arange out of habit.]

Method #2, using max instead of argmax to handle the case where multiple elements reach the maximum value:

>>> a = np.array([[0, 1], [2, 2], [4, 3]])
>>> (a == a.max(axis=1)[:,None]).astype(int)
array([[0, 1],
[1, 1],
[1, 0]])

Replacing Numpy elements if condition is met

>>> import numpy as np
>>> a = np.random.randint(0, 5, size=(5, 4))
>>> a
array([[4, 2, 1, 1],
[3, 0, 1, 2],
[2, 0, 1, 1],
[4, 0, 2, 3],
[0, 0, 0, 2]])
>>> b = a < 3
>>> b
array([[False, True, True, True],
[False, True, True, True],
[ True, True, True, True],
[False, True, True, False],
[ True, True, True, True]], dtype=bool)
>>>
>>> c = b.astype(int)
>>> c
array([[0, 1, 1, 1],
[0, 1, 1, 1],
[1, 1, 1, 1],
[0, 1, 1, 0],
[1, 1, 1, 1]])

You can shorten this with:

>>> c = (a < 3).astype(int)

Getting the indices of several rows in a NumPy array at once

Given A and B, you can generate C using

In [25]: (B[:,None,:] == A).all(axis=-1).argmax(axis=0)
Out[25]: array([0, 1, 0, 2, 1, 2, 2])

Note that this assumes that every row of B is in A. (Otherwise, argmax could return bogus indices where the equality is False.)


Note that if you had NumPy version 1.13 or newer, then
you could use np.unique to generate both B and C at the same time:

In [33]: np.unique(A, axis=0, return_inverse=True)
Out[33]:
(array([[1, 2, 3],
[2, 2, 2],
[2, 3, 3]]), array([0, 1, 0, 2, 1, 2, 2]))

Note that Divakar's solution (using np.void) is far faster, particularly if A has many rows:

A = np.random.randint(10, size=(1000, 3))
B, C = np.unique(A, axis=0, return_inverse=True)

In [44]: %%timeit
....: A1D, B1D = view1D(A, B)
....: sidx = B1D.argsort()
....: out = argsort_unique(sidx)[np.searchsorted(B1D, A1D, sorter=sidx)]
....:
1000 loops, best of 3: 271 µs per loop

In [45]: %timeit (B[:,None,:] == A).all(axis=-1).argmax(axis=0)
100 loops, best of 3: 15.5 ms per loop

Set numpy array elements to zero if they are above a specific threshold

Generally, list comprehensions are faster than for loops in python (because python knows that it doesn't need to care for a lot of things that might happen in a regular for loop):

a = [0 if a_ > thresh else a_ for a_ in a]

but, as @unutbu correctly pointed out, numpy allows list indexing, and element-wise comparison giving you index lists, so:

super_threshold_indices = a > thresh
a[super_threshold_indices] = 0

would be even faster.

Generally, when applying methods on vectors of data, have a look at numpy.ufuncs, which often perform much better than python functions that you map using any native mechanism.



Related Topics



Leave a reply



Submit