Equivalent of Numpy.Argsort() in Basic Python

Equivalent of Numpy.argsort() in basic python?

I timed the suggestions above and here are my results.

import timeit
import random
import numpy as np

def f(seq):
# http://stackoverflow.com/questions/3382352/equivalent-of-numpy-argsort-in-basic-python/3383106#3383106
#non-lambda version by Tony Veijalainen
return [i for (v, i) in sorted((v, i) for (i, v) in enumerate(seq))]

def g(seq):
# http://stackoverflow.com/questions/3382352/equivalent-of-numpy-argsort-in-basic-python/3383106#3383106
#lambda version by Tony Veijalainen
return [x for x,y in sorted(enumerate(seq), key = lambda x: x[1])]

def h(seq):
#http://stackoverflow.com/questions/3382352/equivalent-of-numpy-argsort-in-basic-python/3382369#3382369
#by unutbu
return sorted(range(len(seq)), key=seq.__getitem__)

seq = list(range(10000))
random.shuffle(seq)

n_trials = 100
for cmd in [
'f(seq)', 'g(seq)', 'h(seq)', 'np.argsort(seq)',
'np.argsort(seq).tolist()'
]:
t = timeit.Timer(cmd, globals={**globals(), **locals()})
print('time for {:d}x {:}: {:.6f}'.format(n_trials, cmd, t.timeit(n_trials)))

output

time for 100x f(seq): 0.323915
time for 100x g(seq): 0.235183
time for 100x h(seq): 0.132787
time for 100x np.argsort(seq): 0.091086
time for 100x np.argsort(seq).tolist(): 0.104226

A problem size dependent analysis is given here.

Is it possible to use argsort in descending order?

If you negate an array, the lowest elements become the highest elements and vice-versa. Therefore, the indices of the n highest elements are:

(-avgDists).argsort()[:n]

Another way to reason about this, as mentioned in the comments, is to observe that the big elements are coming last in the argsort. So, you can read from the tail of the argsort to find the n highest elements:

avgDists.argsort()[::-1][:n]

Both methods are O(n log n) in time complexity, because the argsort call is the dominant term here. But the second approach has a nice advantage: it replaces an O(n) negation of the array with an O(1) slice. If you're working with small arrays inside loops then you may get some performance gains from avoiding that negation, and if you're working with huge arrays then you can save on memory usage because the negation creates a copy of the entire array.

Note that these methods do not always give equivalent results: if a stable sort implementation is requested to argsort, e.g. by passing the keyword argument kind='mergesort', then the first strategy will preserve the sorting stability, but the second strategy will break stability (i.e. the positions of equal items will get reversed).

Example timings:

Using a small array of 100 floats and a length 30 tail, the view method was about 15% faster

>>> avgDists = np.random.rand(100)
>>> n = 30
>>> timeit (-avgDists).argsort()[:n]
1.93 µs ± 6.68 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
>>> timeit avgDists.argsort()[::-1][:n]
1.64 µs ± 3.39 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
>>> timeit avgDists.argsort()[-n:][::-1]
1.64 µs ± 3.66 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

For larger arrays, the argsort is dominant and there is no significant timing difference

>>> avgDists = np.random.rand(1000)
>>> n = 300
>>> timeit (-avgDists).argsort()[:n]
21.9 µs ± 51.2 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
>>> timeit avgDists.argsort()[::-1][:n]
21.7 µs ± 33.3 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
>>> timeit avgDists.argsort()[-n:][::-1]
21.9 µs ± 37.1 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Please note that the comment from nedim below is incorrect. Whether to truncate before or after reversing makes no difference in efficiency, since both of these operations are only striding a view of the array differently and not actually copying data.

Numpy argsort - what is it doing?

According to the documentation

Returns the indices that would sort an array.

  • 2 is the index of 0.0.
  • 3 is the index of 0.1.
  • 1 is the index of 1.41.
  • 0 is the index of 1.48.

What is the javascript equivalent of numpy argsort?

You can use a Schwartzian transform also known as Decorate-Sort-Undecorate (DSU) in python.

DSU:

  1. Decorate - Use Array#Map to enrich each item in the array with the needed sort data
  2. Sort - sort using the added data
  3. Undecorate - extract the sorted data using Array#map again

Demo:

const dsu = (arr1, arr2) => arr1
.map((item, index) => [arr2[index], item]) // add the args to sort by
.sort(([arg1], [arg2]) => arg2 - arg1) // sort by the args
.map(([, item]) => item); // extract the sorted items

const clickCount = [5,2,4,3,1];
const imgUrl = ['1.jpg','2.jpg','3.jpg','4.jpg','5.jpg'];

const result = dsu(imgUrl, clickCount);

console.log(result);

Efficient Haskell equivalent to NumPy's argsort

Use vector-algorithms:

import Data.Ord (comparing)

import qualified Data.Vector.Unboxed as VU
import qualified Data.Vector.Algorithms.Intro as VAlgo

argSort :: (Ord a, VU.Unbox a) => VU.Vector a -> VU.Vector Int
argSort xs = VU.map fst $ VU.create $ do
xsi <- VU.thaw $ VU.indexed xs
VAlgo.sortBy (comparing snd) xsi
return xsi

Note these are Unboxed rather than Storable vectors. The latter need to make some tradeoffs to allow impure C FFI operations and can't properly handle heterogeneous tuples. You can of course always convert to and from storable vectors.

C++ - vector version implement of argsort low effiency compared to the one in numpy

I took your implementation and measured it with 10000000 items. It took approximated 1.7 seconds.

Now I introduced a class

class valuePair {
public:
valuePair(int idx, float value) : idx(idx), value(value){};
int idx;
float value;
};

with is initialized as

std::vector<valuePair> pairs;
for (int i = 0; i != 10000000; ++i) {
pairs.push_back(valuePair(i, (double)rand() / (RAND_MAX)));
}

and the sorting than is done with

std::sort(pairs.begin(), pairs.end(), [&](const valuePair &a, const valuePair &b) { return a.value < b.value; });

This code reduces the runtime down to 1.1 seconds. This is I think due to a better cache coherency, but still quite far away from the python results.

How to use numpy.argsort() as indices in more than 2 dimensions?

Here is a general method:

import numpy as np

array = np.array([[[ 0.81774634, 0.62078744],
[ 0.43912609, 0.29718462]],
[[ 0.1266578 , 0.82282054],
[ 0.98180375, 0.79134389]]])

a = 1 # or 0 or 2

order = array.argsort(axis=a)

idx = np.ogrid[tuple(map(slice, array.shape))]
# if you don't need full ND generality: in 3D this can be written
# much more readable as
# m, n, k = array.shape
# idx = np.ogrid[:m, :n, :k]

idx[a] = order

print(np.all(array[idx] == np.sort(array, axis=a)))

Output:

True

Explanation: We must specify for each element of the output array the complete index of the corresponding element of the input array. Thus each index into the input array has the same shape as the output array or must be broadcastable to that shape.

The indices for the axes along which we do not sort/argsort stay in place. We therefore need to pass a broadcastable range(array.shape[i]) for each of those. The easiest way is to use ogrid to create such a range for all dimensions (If we used this directly, the array would come back unchanged.) and then replace the index correspondingg to the sort axis with the output of argsort.

UPDATE March 2019:

Numpy is becoming more strict in enforcing multi-axis indices being passed as tuples. Currently, array[idx] will trigger a deprecation warning. To be future proof use array[tuple(idx)] instead. (Thanks @Nathan)

Or use numpy's new (version 1.15.0) convenience function take_along_axis:

np.take_along_axis(array, order, a)

Sort invariant for numpy.argsort with multiple dimensions

The numpy issue #8708 has a sample implementation of take_along_axis that does what I need; I'm not sure if it's efficient for large arrays but it seems to work.

def take_along_axis(arr, ind, axis):
"""
... here means a "pack" of dimensions, possibly empty

arr: array_like of shape (A..., M, B...)
source array
ind: array_like of shape (A..., K..., B...)
indices to take along each 1d slice of `arr`
axis: int
index of the axis with dimension M

out: array_like of shape (A..., K..., B...)
out[a..., k..., b...] = arr[a..., inds[a..., k..., b...], b...]
"""
if axis < 0:
if axis >= -arr.ndim:
axis += arr.ndim
else:
raise IndexError('axis out of range')
ind_shape = (1,) * ind.ndim
ins_ndim = ind.ndim - (arr.ndim - 1) #inserted dimensions

dest_dims = list(range(axis)) + [None] + list(range(axis+ins_ndim, ind.ndim))

# could also call np.ix_ here with some dummy arguments, then throw those results away
inds = []
for dim, n in zip(dest_dims, arr.shape):
if dim is None:
inds.append(ind)
else:
ind_shape_dim = ind_shape[:dim] + (-1,) + ind_shape[dim+1:]
inds.append(np.arange(n).reshape(ind_shape_dim))

return arr[tuple(inds)]

which yields

>>> A = np.array([[3,2,1],[4,0,6]])
>>> B = np.array([[3,1,4],[1,5,9]])
>>> i = A.argsort(axis=-1)
>>> take_along_axis(A,i,axis=-1)
array([[1, 2, 3],
[0, 4, 6]])
>>> take_along_axis(B,i,axis=-1)
array([[4, 1, 3],
[5, 1, 9]])


Related Topics



Leave a reply



Submit