Equivalent of Numpy.argsort() in basic python?
I timed the suggestions above and here are my results.
import timeit
import random
import numpy as np
def f(seq):
# http://stackoverflow.com/questions/3382352/equivalent-of-numpy-argsort-in-basic-python/3383106#3383106
#non-lambda version by Tony Veijalainen
return [i for (v, i) in sorted((v, i) for (i, v) in enumerate(seq))]
def g(seq):
# http://stackoverflow.com/questions/3382352/equivalent-of-numpy-argsort-in-basic-python/3383106#3383106
#lambda version by Tony Veijalainen
return [x for x,y in sorted(enumerate(seq), key = lambda x: x[1])]
def h(seq):
#http://stackoverflow.com/questions/3382352/equivalent-of-numpy-argsort-in-basic-python/3382369#3382369
#by unutbu
return sorted(range(len(seq)), key=seq.__getitem__)
seq = list(range(10000))
random.shuffle(seq)
n_trials = 100
for cmd in [
'f(seq)', 'g(seq)', 'h(seq)', 'np.argsort(seq)',
'np.argsort(seq).tolist()'
]:
t = timeit.Timer(cmd, globals={**globals(), **locals()})
print('time for {:d}x {:}: {:.6f}'.format(n_trials, cmd, t.timeit(n_trials)))
output
time for 100x f(seq): 0.323915
time for 100x g(seq): 0.235183
time for 100x h(seq): 0.132787
time for 100x np.argsort(seq): 0.091086
time for 100x np.argsort(seq).tolist(): 0.104226
A problem size dependent analysis is given here.
Is it possible to use argsort in descending order?
If you negate an array, the lowest elements become the highest elements and vice-versa. Therefore, the indices of the n
highest elements are:
(-avgDists).argsort()[:n]
Another way to reason about this, as mentioned in the comments, is to observe that the big elements are coming last in the argsort. So, you can read from the tail of the argsort to find the n
highest elements:
avgDists.argsort()[::-1][:n]
Both methods are O(n log n) in time complexity, because the argsort
call is the dominant term here. But the second approach has a nice advantage: it replaces an O(n) negation of the array with an O(1) slice. If you're working with small arrays inside loops then you may get some performance gains from avoiding that negation, and if you're working with huge arrays then you can save on memory usage because the negation creates a copy of the entire array.
Note that these methods do not always give equivalent results: if a stable sort implementation is requested to argsort
, e.g. by passing the keyword argument kind='mergesort'
, then the first strategy will preserve the sorting stability, but the second strategy will break stability (i.e. the positions of equal items will get reversed).
Example timings:
Using a small array of 100 floats and a length 30 tail, the view method was about 15% faster
>>> avgDists = np.random.rand(100)
>>> n = 30
>>> timeit (-avgDists).argsort()[:n]
1.93 µs ± 6.68 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
>>> timeit avgDists.argsort()[::-1][:n]
1.64 µs ± 3.39 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
>>> timeit avgDists.argsort()[-n:][::-1]
1.64 µs ± 3.66 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
For larger arrays, the argsort is dominant and there is no significant timing difference
>>> avgDists = np.random.rand(1000)
>>> n = 300
>>> timeit (-avgDists).argsort()[:n]
21.9 µs ± 51.2 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
>>> timeit avgDists.argsort()[::-1][:n]
21.7 µs ± 33.3 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
>>> timeit avgDists.argsort()[-n:][::-1]
21.9 µs ± 37.1 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Please note that the comment from nedim below is incorrect. Whether to truncate before or after reversing makes no difference in efficiency, since both of these operations are only striding a view of the array differently and not actually copying data.
Numpy argsort - what is it doing?
According to the documentation
Returns the indices that would sort an array.
2
is the index of0.0
.3
is the index of0.1
.1
is the index of1.41
.0
is the index of1.48
.
What is the javascript equivalent of numpy argsort?
You can use a Schwartzian transform also known as Decorate-Sort-Undecorate (DSU) in python.
DSU:
- Decorate - Use Array#Map to enrich each item in the array with the needed sort data
- Sort - sort using the added data
- Undecorate - extract the sorted data using Array#map again
Demo:
const dsu = (arr1, arr2) => arr1
.map((item, index) => [arr2[index], item]) // add the args to sort by
.sort(([arg1], [arg2]) => arg2 - arg1) // sort by the args
.map(([, item]) => item); // extract the sorted items
const clickCount = [5,2,4,3,1];
const imgUrl = ['1.jpg','2.jpg','3.jpg','4.jpg','5.jpg'];
const result = dsu(imgUrl, clickCount);
console.log(result);
Efficient Haskell equivalent to NumPy's argsort
Use vector-algorithms:
import Data.Ord (comparing)
import qualified Data.Vector.Unboxed as VU
import qualified Data.Vector.Algorithms.Intro as VAlgo
argSort :: (Ord a, VU.Unbox a) => VU.Vector a -> VU.Vector Int
argSort xs = VU.map fst $ VU.create $ do
xsi <- VU.thaw $ VU.indexed xs
VAlgo.sortBy (comparing snd) xsi
return xsi
Note these are Unboxed
rather than Storable
vectors. The latter need to make some tradeoffs to allow impure C FFI operations and can't properly handle heterogeneous tuples. You can of course always convert
to and from storable vectors.
C++ - vector version implement of argsort low effiency compared to the one in numpy
I took your implementation and measured it with 10000000
items. It took approximated 1.7 seconds.
Now I introduced a class
class valuePair {
public:
valuePair(int idx, float value) : idx(idx), value(value){};
int idx;
float value;
};
with is initialized as
std::vector<valuePair> pairs;
for (int i = 0; i != 10000000; ++i) {
pairs.push_back(valuePair(i, (double)rand() / (RAND_MAX)));
}
and the sorting than is done with
std::sort(pairs.begin(), pairs.end(), [&](const valuePair &a, const valuePair &b) { return a.value < b.value; });
This code reduces the runtime down to 1.1 seconds. This is I think due to a better cache coherency, but still quite far away from the python results.
How to use numpy.argsort() as indices in more than 2 dimensions?
Here is a general method:
import numpy as np
array = np.array([[[ 0.81774634, 0.62078744],
[ 0.43912609, 0.29718462]],
[[ 0.1266578 , 0.82282054],
[ 0.98180375, 0.79134389]]])
a = 1 # or 0 or 2
order = array.argsort(axis=a)
idx = np.ogrid[tuple(map(slice, array.shape))]
# if you don't need full ND generality: in 3D this can be written
# much more readable as
# m, n, k = array.shape
# idx = np.ogrid[:m, :n, :k]
idx[a] = order
print(np.all(array[idx] == np.sort(array, axis=a)))
Output:
True
Explanation: We must specify for each element of the output array the complete index of the corresponding element of the input array. Thus each index into the input array has the same shape as the output array or must be broadcastable to that shape.
The indices for the axes along which we do not sort/argsort stay in place. We therefore need to pass a broadcastable range(array.shape[i]) for each of those. The easiest way is to use ogrid to create such a range for all dimensions (If we used this directly, the array would come back unchanged.) and then replace the index correspondingg to the sort axis with the output of argsort
.
UPDATE March 2019:
Numpy is becoming more strict in enforcing multi-axis indices being passed as tuples. Currently, array[idx]
will trigger a deprecation warning. To be future proof use array[tuple(idx)]
instead. (Thanks @Nathan)
Or use numpy's new (version 1.15.0) convenience function take_along_axis
:
np.take_along_axis(array, order, a)
Sort invariant for numpy.argsort with multiple dimensions
The numpy issue #8708 has a sample implementation of take_along_axis that does what I need; I'm not sure if it's efficient for large arrays but it seems to work.
def take_along_axis(arr, ind, axis):
"""
... here means a "pack" of dimensions, possibly empty
arr: array_like of shape (A..., M, B...)
source array
ind: array_like of shape (A..., K..., B...)
indices to take along each 1d slice of `arr`
axis: int
index of the axis with dimension M
out: array_like of shape (A..., K..., B...)
out[a..., k..., b...] = arr[a..., inds[a..., k..., b...], b...]
"""
if axis < 0:
if axis >= -arr.ndim:
axis += arr.ndim
else:
raise IndexError('axis out of range')
ind_shape = (1,) * ind.ndim
ins_ndim = ind.ndim - (arr.ndim - 1) #inserted dimensions
dest_dims = list(range(axis)) + [None] + list(range(axis+ins_ndim, ind.ndim))
# could also call np.ix_ here with some dummy arguments, then throw those results away
inds = []
for dim, n in zip(dest_dims, arr.shape):
if dim is None:
inds.append(ind)
else:
ind_shape_dim = ind_shape[:dim] + (-1,) + ind_shape[dim+1:]
inds.append(np.arange(n).reshape(ind_shape_dim))
return arr[tuple(inds)]
which yields
>>> A = np.array([[3,2,1],[4,0,6]])
>>> B = np.array([[3,1,4],[1,5,9]])
>>> i = A.argsort(axis=-1)
>>> take_along_axis(A,i,axis=-1)
array([[1, 2, 3],
[0, 4, 6]])
>>> take_along_axis(B,i,axis=-1)
array([[4, 1, 3],
[5, 1, 9]])
Related Topics
<Django Object > Is Not JSON Serializable
Datetime Timezone Conversion Using Pytz
Is the Shortcircuit Behaviour of Python's Any/All Explicit
Pandas Concat Generates Nan Values
Inserting a Table Name into a Query Gives SQLite3.Operationalerror: Near "": Syntax Error
What Is the Point of Setlevel in a Python Logging Handler
Link Several Popen Commands with Pipes
Generate Permutations of List with Repeated Elements
Python 2.X - Write Binary Output to Stdout
Can You Give a Django App a Verbose Name for Use Throughout the Admin
Efficient Way to Add Spaces Between Characters in a String
High Performance Fuzzy String Comparison in Python, Use Levenshtein or Difflib