How to Use Argsort in Descending Order

Is it possible to use argsort in descending order?

If you negate an array, the lowest elements become the highest elements and vice-versa. Therefore, the indices of the n highest elements are:

(-avgDists).argsort()[:n]

Another way to reason about this, as mentioned in the comments, is to observe that the big elements are coming last in the argsort. So, you can read from the tail of the argsort to find the n highest elements:

avgDists.argsort()[::-1][:n]

Both methods are O(n log n) in time complexity, because the argsort call is the dominant term here. But the second approach has a nice advantage: it replaces an O(n) negation of the array with an O(1) slice. If you're working with small arrays inside loops then you may get some performance gains from avoiding that negation, and if you're working with huge arrays then you can save on memory usage because the negation creates a copy of the entire array.

Note that these methods do not always give equivalent results: if a stable sort implementation is requested to argsort, e.g. by passing the keyword argument kind='mergesort', then the first strategy will preserve the sorting stability, but the second strategy will break stability (i.e. the positions of equal items will get reversed).

Example timings:

Using a small array of 100 floats and a length 30 tail, the view method was about 15% faster

>>> avgDists = np.random.rand(100)
>>> n = 30
>>> timeit (-avgDists).argsort()[:n]
1.93 µs ± 6.68 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
>>> timeit avgDists.argsort()[::-1][:n]
1.64 µs ± 3.39 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
>>> timeit avgDists.argsort()[-n:][::-1]
1.64 µs ± 3.66 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

For larger arrays, the argsort is dominant and there is no significant timing difference

>>> avgDists = np.random.rand(1000)
>>> n = 300
>>> timeit (-avgDists).argsort()[:n]
21.9 µs ± 51.2 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
>>> timeit avgDists.argsort()[::-1][:n]
21.7 µs ± 33.3 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
>>> timeit avgDists.argsort()[-n:][::-1]
21.9 µs ± 37.1 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Please note that the comment from nedim below is incorrect. Whether to truncate before or after reversing makes no difference in efficiency, since both of these operations are only striding a view of the array differently and not actually copying data.

What is the sorting direction (ascending or descending) of numpy.argsort?

So most sorting algorithms sort in ascending order if nothing else is specified. You can always just reverse the output yourself to get the sorting in descending order

import numpy as np
x = np.array([3, 1, 2])
ascending = np.argsort(x)
descending = ascending[::-1]

For more information on sorting direction of np.argsort, you can have a look at this post Is it possible to use argsort in descending order?

EDIT: I found a reference to the sort order of numpy here: https://numpy.org/doc/stable/reference/generated/numpy.sort.html#numpy.sort where it is mentioned that sorting is done in lexicographic order by default

How to sort in descending order with numpy?

Just multiply your matrix by -1 to reverse order:

[In]: A = np.array([[1, 3, 2, 7],
                    [2, 4, 1, 3],
                    [6, 1, 2, 3]])
[In]: print( np.argsort(-A) )
[Out]: [[3 1 2 0]
        [1 3 0 2]
        [0 3 2 1]]

Reverse sort and argsort in python

I don't think there's any real need to skip the toarray. The v array will be only n_docs long, which is dwarfed by the size of the n_docs × n_terms tf-idf matrix in practical situations. Also, it will be quite dense since any term shared by two documents will give them a non-zero similarity. Sparse matrix representations only pay off when the matrix you're storing is very sparse (I've seen >80% figures for Matlab and assume that Scipy will be similar, though I don't have an exact figure).

The double sort can be skipped by doing

v = v.toarray()
vi = np.argsort(v, axis=0)[::-1]
vs = v[vi]

Btw., your use of np.inner on sparse matrices is not going to work with the latest versions of NumPy; the safe way of taking an inner product of two sparse matrices is

v = (tfidf * tfidf[idx, :]).transpose()

NumPy - descending stable arg-sort of arrays of any dtype

I think this formula should work:

import numpy as np
a = np.array([1, 2, 2, 3, 3, 3])
s = len(a) - 1 - np.argsort(a[::-1], kind='stable')[::-1]
print(s)
# [3 4 5 1 2 0]

Python: argsort in descending order for 2d array?

in this particular requirement you can use python's sorted which is stable:

a =  np.array([[2, 3], [1998,5], [1998,7]])
res = np.array(sorted(a, key= lambda x: -x[0]))

it does: use the first element of each row for comparison (by lambda accessor) and negate it for decreasing order. By stability, the rows will preserve order if the first element is the same

ouput:

[[1998    5]
 [1998    7]
 [   2    3]]

EDIT: btw if you wanted to sort by the following columns whenever the preceeding ones are identical (all of the same ordering):

a =  np.array([[2, 3], [1998,5], [1998,7]])
res = np.array(sorted(a, key=lambda x:(-x).tolist()))

this converts the rows to lists and then uses sequence comparison. Note in this example it will be sorted decreasingly (hence (-x))

Descending sorting in numpy by several columns

Use numpy.lexsort to sort on multiple columns at the same time.

arr = np.array([
    [150, 8],
    [105, 20],
    [90, 100],
    [101, 12],
    [110, 80],
    [105, 100],
])

order = np.lexsort([arr[:, 1], arr[:, 0]])[::-1]
arr[order]

yields:

array([[150,   8],
       [110,  80],
       [105, 100],
       [105,  20],
       [101,  12],
       [ 90, 100]])

undo or reverse argsort(), python

I'm not sure how best to do it in numpy, but, in pure Python, the reasoning would be:

aargsort is holding a permutation of range(len(a)) telling you where the items of aSort came from -- much like, in pure Python:

>>> x = list('ciaobelu')
>>> r = range(len(x))
>>> r.sort(key=x.__getitem__)
>>> r
[2, 4, 0, 5, 1, 6, 3, 7]
>>>

i.e., the first argument of sorted(x) will be x[2], the second one x[4], and so forth.

So given the sorted version, you can reconstruct the original by "putting items back where they came from":

>>> s = sorted(x)
>>> s
['a', 'b', 'c', 'e', 'i', 'l', 'o', 'u']
>>> original = [None] * len(s)
>>> for i, c in zip(r, s): original[i] = c
... 
>>> original
['c', 'i', 'a', 'o', 'b', 'e', 'l', 'u']
>>>

Of course there are going to be tighter and faster ways to express this in numpy (which unfortunately I don't know inside-out as much as I know Python itself;-), but I hope this helps by showing the underlying logic of the "putting things back in place" operation you need to perform.

Numpy sorting 2d array by descending and take first N from each row

Using numpy.argsort() returns an array of indices for the sorted array. As such, what your out_arr1 lets you know is where on each row to find the highest values.

If you are to continue this way, what you would need to do is for each row in in_arr (hereby written as in_arr[i]) take values found at the first 3 indices in out_arr1[i].

What that means is that out_arr1[i, 0] tells you where the highest value in in_arr on row i is located. In our case, out_arr1[0, 0] = 3, which means the highest value in row 0 is 40 (on index 3)

Doing this, the 3 largest numbers on each row are represented by out_arr1[0, 0], out_arr1[0, 1], out_arr1[0, 2] and out_arr1[1, 0], out_arr1[1, 1], out_arr1[1, 2].

to get the desired output, we would need something along the lines of:

final_arr = numpy.array([in_arr[0, out_arr1[0, 0], in_arr[0, out_arr1[0, 1], in_arr[0, out_arr1[0, 2], in_arr[1, out_arr1[1, 0], in_arr[1, out_arr1[1, 1], in_arr[1, out_arr1[1, 2]])

This however, is less than elegant, and there is another, easier solution to your problem.

Using numpy.sort() instead of numpy.argsort() we can return the exact values of in_arr sorted along an axis. By doing that, we no longer need to use an output index to find our 3 highest values, as they are the first 3 in our new output.

Considering out_arr2 as the output from numpy.sort(), the final array would look like:

final_arr = numpy.array([[out_arr[0, 0], out_arr[0, 1], out_arr[0, 2]], [out_arr[1, 0], out_arr[1, 1], out_arr[1, 2]]])