A fast way to find the largest N elements in an numpy array
The bottleneck
module has a fast partial sort method that works directly with Numpy arrays: bottleneck.partition()
.
Note that bottleneck.partition()
returns the actual values sorted, if you want the indexes of the sorted values (what numpy.argsort()
returns) you should use bottleneck.argpartition()
.
I've benchmarked:
z = -bottleneck.partition(-a, 10)[:10]
z = a.argsort()[-10:]
z = heapq.nlargest(10, a)
where a
is a random 1,000,000-element array.
The timings were as follows:
bottleneck.partition()
: 25.6 ms per loopnp.argsort()
: 198 ms per loopheapq.nlargest()
: 358 ms per loop
How do I get indices of N maximum values in a NumPy array?
Newer NumPy versions (1.8 and up) have a function called argpartition
for this. To get the indices of the four largest elements, do
>>> a = np.array([9, 4, 4, 3, 3, 9, 0, 4, 6, 0])
>>> a
array([9, 4, 4, 3, 3, 9, 0, 4, 6, 0])
>>> ind = np.argpartition(a, -4)[-4:]
>>> ind
array([1, 5, 8, 0])
>>> top4 = a[ind]
>>> top4
array([4, 9, 6, 9])
Unlike argsort
, this function runs in linear time in the worst case, but the returned indices are not sorted, as can be seen from the result of evaluating a[ind]
. If you need that too, sort them afterwards:
>>> ind[np.argsort(a[ind])]
array([1, 8, 5, 0])
To get the top-k elements in sorted order in this way takes O(n + k log k) time.
Quickest way to find the nth largest value in a numpy Matrix
You can flatten the matrix and then sort it:
>>> k = np.array([[ 35, 48, 63],
... [ 60, 77, 96],
... [ 91, 112, 135]])
>>> flat=k.flatten()
>>> flat.sort()
>>> flat
array([ 35, 48, 60, 63, 77, 91, 96, 112, 135])
>>> flat[-2]
112
>>> flat[-3]
96
N largest values in each row of ndarray
You can use np.partition
in the same way as the question you linked: the sorting is already along the last axis:
In [2]: a = np.array([[ 5, 4, 3, 2, 1],
[10, 9, 8, 7, 6]])
In [3]: b = np.partition(a, -3) # top 3 values from each row
In [4]: b[:,-3:]
Out[4]:
array([[ 3, 4, 5],
[ 8, 9, 10]])
how to get the index of the largest n values in a multi-dimensional numpy array
I don't have access to bottleneck
, so in this example I am using argsort
, but you should be able to use it in the same way:
#!/usr/bin/env python
import numpy as np
N = 4
a = np.random.random(20).reshape(4, 5)
print(a)
# Convert it into a 1D array
a_1d = a.flatten()
# Find the indices in the 1D array
idx_1d = a_1d.argsort()[-N:]
# convert the idx_1d back into indices arrays for each dimension
x_idx, y_idx = np.unravel_index(idx_1d, a.shape)
# Check that we got the largest values.
for x, y, in zip(x_idx, y_idx):
print(a[x][y])
How to get first 5 maximum values from numpy array in python?
You can do this (each step is commented for clarity):
import numpy as np
x = np.array([3, 4, 2, 1, 7, 8, 6, 5, 9])
y = x.copy() # <----optional, create a copy of the array
y = np.sort(x) # sort array
y = y[::-1] # reverse sort order
y = y[0:5] # take a slice of the first 5
print(y)
The result:
[9 8 7 6 5]
Get the indices of N highest values in an ndarray
You can use numpy.argpartition
on flattened version of array first to get the indices of top k
items, and then you can convert those 1D indices as per the array's shape using numpy.unravel_index
:
>>> arr = np.arange(100*100*100).reshape(100, 100, 100)
>>> np.random.shuffle(arr)
>>> indices = np.argpartition(arr.flatten(), -2)[-2:]
>>> np.vstack(np.unravel_index(indices, arr.shape)).T
array([[97, 99, 98],
[97, 99, 99]])
)
>>> arr[97][99][98]
999998
>>> arr[97][99][99]
999999
Python: Find most big Top-n values' index in List or numpy.ndarry
Sort the index by value,
def most_find(sequence, n):
lst = sorted(range(len(sequence)), key=lambda x:sequence[x], reverse=True)
return lst[:n]
a = [1, 5, 6, 2, 3]
result = most_find(a, 3)
print(result)
Related Topics
Python - Pygame Error When Executing Exe File
Conda Command Is Not Recognized on Windows 10
Using Requests with Tls Doesn't Give Sni Support
Python Global Exception Handling
Why Are Slice and Range Upper-Bound Exclusive
Full Examples of Using Pyserial Package
Assignment Inside Lambda Expression in Python
Why Is Looping Over Range() in Python Faster Than Using a While Loop
Class That Acts as Mapping for **Unpacking
Tkinter Vanishing Photoimage Issue
Too Many Different Python Versions on My System and Causing Problems
Why Does Defining _Getitem_ on a Class Make It Iterable in Python
Syntaxerror: Multiple Statements Found While Compiling a Single Statement