Frequency counts for unique values in a NumPy array
Take a look at np.bincount
:
http://docs.scipy.org/doc/numpy/reference/generated/numpy.bincount.html
import numpy as np
x = np.array([1,1,1,2,2,2,5,25,1,1])
y = np.bincount(x)
ii = np.nonzero(y)[0]
And then:
zip(ii,y[ii])
# [(1, 5), (2, 3), (5, 1), (25, 1)]
or:
np.vstack((ii,y[ii])).T
# array([[ 1, 5],
[ 2, 3],
[ 5, 1],
[25, 1]])
or however you want to combine the counts and the unique values.
frequency of unique values for 2d numpy array
You can use numpy.unique()
, for axis=0, and pass return_counts=True
, It will return a tuple with unique values, and the counts for these values.
np.unique(arr, return_counts=True, axis=0)
OUTPUT:
(array([[0, 1],
[1, 0]]), array([1, 1], dtype=int64))
Frequency counts of unique values in a numpy matrix
Assuming that all values in the matrix are non-negative, you can convert the matrix to a numpy array and then use the bincount
:
np.bincount(np.array(mat).reshape(1,mat.size)[0])
How to count frequency of a element in numpy array?
Use numpy.unique
with return_counts=True
parameter, which will return the count of each of the elements in the array.
# sample array
In [89]: np.random.seed(23)
In [90]: arr = np.random.randint(0, 10, 20)
In [92]: a, cnts = np.unique(arr, return_counts=True)
In [94]: high_freq, high_freq_element = cnts.max(), a[cnts.argmax()]
In [95]: high_freq, high_freq_element
Out[95]: (4, 9)
For selecting only the elements which appear above a certain frequency threshold, you can use:
In [96]: threshold = 2
# select elements which occur only more than 2 times
In [97]: a[cnts > threshold]
Out[97]: array([3, 5, 6, 9])
Interpretation of counts for `numpy.unique` when applied on a matrix
When you specify an axis, np.unique
returns unique subarrays indexed along this axis. To see is better, assume that one of the rows repeats:
m_sample = np.array([
[1, 2, 1],
[2, 2, 2],
[3, 3, 3],
[1, 4, 5],
[1, 2, 1]
])
In such case np.unique(m_sample, axis=0, return_counts=True)
gives:
(array([[1, 2, 1],
[1, 4, 5],
[2, 2, 2],
[3, 3, 3]]),
array([2, 1, 1, 1]))
The first element of this tuple lists unique rows of the array, and the second how many times each row appears in the array. In this example, the row [1, 2, 1]
is repeated twice.
To get unique values in each row you can try, for example, the following:
import numpy as np
m_sample = np.array([
[1, 2, 1],
[2, 2, 2],
[3, 3, 3],
[1, 4, 5]
])
s = np.sort(m_sample, axis=1)
mask = np.full(m_sample.shape, True)
mask[:, 1:] = s[:, :-1] != s[:, 1:]
np.split(s[mask], np.cumsum(mask.sum(axis=1)))[:-1]
It gives:
[array([1, 2]), array([2]), array([3]), array([1, 4, 5])]
Most efficient way to calculate frequency of pairs of numbers in a 2D Numpy array
If your elements are not too large nonnegative integers bincount
is fast:
from collections import Counter
from itertools import combinations
import numpy as np
def pairs(a):
M = a.max() + 1
a = a.T
return sum(np.bincount((M * a[j] + a[j+1:]).ravel(), None, M*M)
for j in range(len(a) - 1)).reshape(M, M)
def pairs_F_3(a):
M = a.max() + 1
return (np.bincount(a[1:].ravel() + M*a[:2].ravel(), None, M*M) +
np.bincount(a[2].ravel() + M*a[0].ravel(), None, M*M))
def pairs_F(a):
M = a.max() + 1
a = np.ascontiguousarray(a.T) # contiguous columns (rows after .T)
# appear to be typically perform better
# thanks @ning chen
return sum(np.bincount((M * a[j] + a[j+1:]).ravel(), None, M*M)
for j in range(len(a) - 1)).reshape(M, M)
def pairs_dict(a):
p = pairs_F(a)
# p is a 2D table with the frequency of (y, x) at position y, x
y, x = np.where(p)
c = p[y, x]
return {(yi, xi): ci for yi, xi, ci in zip(y, x, c)}
def pair_freq(a, sort=False, sort_axis=-1):
a = np.asarray(a)
if sort:
a = np.sort(a, axis=sort_axis)
res = Counter()
for row in a:
res.update(combinations(row, 2))
return res
from timeit import timeit
A = [np.random.randint(0, 1000, (1000, 120)),
np.random.randint(0, 100, (100000, 12))]
for a in A:
print('shape:', a.shape, 'range:', a.max() + 1)
res2 = pairs_dict(a)
res = pair_freq(a)
print(f'results equal: {res==res2}')
print('bincount', timeit(lambda:pairs(a), number=10)*100, 'ms')
print('bc(F) ', timeit(lambda:pairs_F(a), number=10)*100, 'ms')
print('bc->dict', timeit(lambda:pairs_dict(a), number=10)*100, 'ms')
print('Counter ', timeit(lambda:pair_freq(a), number=4)*250,'ms')
Sample run:
shape: (1000, 120) range: 1000
results equal: True
bincount 461.14772390574217 ms
bc(F) 435.3669326752424 ms
bc->dict 932.1215840056539 ms
Counter 3473.3258984051645 ms
shape: (100000, 12) range: 100
results equal: True
bincount 89.80463854968548 ms
bc(F) 43.449611216783524 ms
bc->dict 46.470773220062256 ms
Counter 1987.6734036952257 ms
Combine counts from multiple numpy.uniques
You can simply apply np.unique
to get the array with all the unique values and get at the same time the location for each item in the sorted array. Then you can accumulate the number of items based on the previous index so to get the merged number of item.
all_unique_values, index = np.unique(multi_unique_values, return_inverse=True)
all_unique_counts= np.zeros(all_unique_values.size, np.int64)
np.add.at(all_unique_counts, index, multi_unique_counts.ravel()) # inplace
all_unique_counts
Efficiently counting number of unique elements - NumPy / Python
Here's a method that works for an array with dtype np.uint8
that is faster than np.unique
.
First, create an array to work with:
In [128]: a = np.random.randint(1, 128, size=(10, 3000, 3000)).astype(np.uint8)
For later comparison, find the unique values using np.unique
:
In [129]: u = np.unique(a)
Here's the faster method; v
will contain the result:
In [130]: q = np.zeros(256, dtype=int)
In [131]: q[a.ravel()] = 1
In [132]: v = np.nonzero(q)[0]
Verify that we got the same result:
In [133]: np.array_equal(u, v)
Out[133]: True
Timing:
In [134]: %timeit u = np.unique(a)
2.86 s ± 9.02 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [135]: %timeit q = np.zeros(256, dtype=int); q[a.ravel()] = 1; v = np.nonzero(q)
300 ms ± 5.52 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
So 2.86 seconds for np.unique()
, and 0.3 seconds for the alternative method.
efficient way to count frequency of 2D numpy array
You could use scipy.stats.itemfreq
from scipy.stats import itemfreq
itemfreq(r_nm)
np.unique works too
import numpy as np
(unique, counts) = np.unique(r_nm, return_counts=True)
frequencies = np.asarray((unique, counts)).T
Screenshot of Output with np.unique
Related Topics
How to Make the Python Program to Check Linux Services
Creating Pyqt5 Buttons in a Loop: All Buttons Trigger the Same Callback
Why Is Signal.Sigalrm Not Working in Python on Windows
Standard Way to Open a Folder Window in Linux
Correct Daemon Behaviour (From Pep 3143) Explained
A Function Callback Every Time a Key Is Pressed (Regardless of Which Window Has Focus)
Cannot Bind Numpad Minus Key on Linux with Tkinter
"Valueerror: _Type_ 'V' Not Supported" Error After Installing Pyreadline
Using && in Subprocess.Popen for Command Chaining
How to Programmatically Edit Excel Sheets
Bash: Variable in Single Quote
Google Cloud Sdk: Set Environment Variable_ Python --> Linux
Python: Xlib -- How to Raise(Bring to Top) Windows
Histogram of an Image's "Black Ink Level" by Horizontal Axis
How to Downgrade My Version of Python from 3.7.5 to 3.6.5 on Ubuntu