A Fast Way to Find the Largest N Elements in an Numpy Array

A fast way to find the largest N elements in an numpy array

The bottleneck module has a fast partial sort method that works directly with Numpy arrays: bottleneck.partition().

Note that bottleneck.partition() returns the actual values sorted, if you want the indexes of the sorted values (what numpy.argsort() returns) you should use bottleneck.argpartition().

I've benchmarked:

  • z = -bottleneck.partition(-a, 10)[:10]
  • z = a.argsort()[-10:]
  • z = heapq.nlargest(10, a)

where a is a random 1,000,000-element array.

The timings were as follows:

  • bottleneck.partition(): 25.6 ms per loop
  • np.argsort(): 198 ms per loop
  • heapq.nlargest(): 358 ms per loop

How do I get indices of N maximum values in a NumPy array?

Newer NumPy versions (1.8 and up) have a function called argpartition for this. To get the indices of the four largest elements, do

>>> a = np.array([9, 4, 4, 3, 3, 9, 0, 4, 6, 0])
>>> a
array([9, 4, 4, 3, 3, 9, 0, 4, 6, 0])

>>> ind = np.argpartition(a, -4)[-4:]
>>> ind
array([1, 5, 8, 0])

>>> top4 = a[ind]
>>> top4
array([4, 9, 6, 9])

Unlike argsort, this function runs in linear time in the worst case, but the returned indices are not sorted, as can be seen from the result of evaluating a[ind]. If you need that too, sort them afterwards:

>>> ind[np.argsort(a[ind])]
array([1, 8, 5, 0])

To get the top-k elements in sorted order in this way takes O(n + k log k) time.

Quickest way to find the nth largest value in a numpy Matrix

You can flatten the matrix and then sort it:

>>> k = np.array([[ 35,  48,  63],
... [ 60, 77, 96],
... [ 91, 112, 135]])
>>> flat=k.flatten()
>>> flat.sort()
>>> flat
array([ 35, 48, 60, 63, 77, 91, 96, 112, 135])
>>> flat[-2]
112
>>> flat[-3]
96

N largest values in each row of ndarray

You can use np.partition in the same way as the question you linked: the sorting is already along the last axis:

In [2]: a = np.array([[ 5,  4,  3,  2,  1],
[10, 9, 8, 7, 6]])
In [3]: b = np.partition(a, -3) # top 3 values from each row
In [4]: b[:,-3:]
Out[4]:
array([[ 3, 4, 5],
[ 8, 9, 10]])

how to get the index of the largest n values in a multi-dimensional numpy array

I don't have access to bottleneck, so in this example I am using argsort, but you should be able to use it in the same way:

#!/usr/bin/env python
import numpy as np
N = 4
a = np.random.random(20).reshape(4, 5)
print(a)

# Convert it into a 1D array
a_1d = a.flatten()

# Find the indices in the 1D array
idx_1d = a_1d.argsort()[-N:]

# convert the idx_1d back into indices arrays for each dimension
x_idx, y_idx = np.unravel_index(idx_1d, a.shape)

# Check that we got the largest values.
for x, y, in zip(x_idx, y_idx):
print(a[x][y])

How to get first 5 maximum values from numpy array in python?

You can do this (each step is commented for clarity):

import numpy as np
x = np.array([3, 4, 2, 1, 7, 8, 6, 5, 9])

y = x.copy() # <----optional, create a copy of the array
y = np.sort(x) # sort array
y = y[::-1] # reverse sort order
y = y[0:5] # take a slice of the first 5
print(y)

The result:

[9 8 7 6 5]

Get the indices of N highest values in an ndarray

You can use numpy.argpartition on flattened version of array first to get the indices of top k items, and then you can convert those 1D indices as per the array's shape using numpy.unravel_index:

>>> arr = np.arange(100*100*100).reshape(100, 100, 100)
>>> np.random.shuffle(arr)
>>> indices = np.argpartition(arr.flatten(), -2)[-2:]
>>> np.vstack(np.unravel_index(indices, arr.shape)).T
array([[97, 99, 98],
[97, 99, 99]])
)
>>> arr[97][99][98]
999998
>>> arr[97][99][99]
999999

Python: Find most big Top-n values' index in List or numpy.ndarry

Sort the index by value,

def most_find(sequence, n):
lst = sorted(range(len(sequence)), key=lambda x:sequence[x], reverse=True)
return lst[:n]

a = [1, 5, 6, 2, 3]
result = most_find(a, 3)

print(result)


Related Topics



Leave a reply



Submit