Frequency Counts for Unique Values in a Numpy Array

Frequency counts for unique values in a NumPy array

Take a look at np.bincount:

http://docs.scipy.org/doc/numpy/reference/generated/numpy.bincount.html

import numpy as np
x = np.array([1,1,1,2,2,2,5,25,1,1])
y = np.bincount(x)
ii = np.nonzero(y)[0]

And then:

zip(ii,y[ii]) 
# [(1, 5), (2, 3), (5, 1), (25, 1)]

or:

np.vstack((ii,y[ii])).T
# array([[ 1, 5],
[ 2, 3],
[ 5, 1],
[25, 1]])

or however you want to combine the counts and the unique values.

frequency of unique values for 2d numpy array

You can use numpy.unique(), for axis=0, and pass return_counts=True, It will return a tuple with unique values, and the counts for these values.

np.unique(arr, return_counts=True, axis=0)

OUTPUT:

(array([[0, 1],
[1, 0]]), array([1, 1], dtype=int64))

Frequency counts of unique values in a numpy matrix

Assuming that all values in the matrix are non-negative, you can convert the matrix to a numpy array and then use the bincount:

np.bincount(np.array(mat).reshape(1,mat.size)[0])

How to count frequency of a element in numpy array?

Use numpy.unique with return_counts=True parameter, which will return the count of each of the elements in the array.

# sample array
In [89]: np.random.seed(23)
In [90]: arr = np.random.randint(0, 10, 20)

In [92]: a, cnts = np.unique(arr, return_counts=True)
In [94]: high_freq, high_freq_element = cnts.max(), a[cnts.argmax()]

In [95]: high_freq, high_freq_element
Out[95]: (4, 9)

For selecting only the elements which appear above a certain frequency threshold, you can use:

In [96]: threshold = 2

# select elements which occur only more than 2 times
In [97]: a[cnts > threshold]
Out[97]: array([3, 5, 6, 9])

Interpretation of counts for `numpy.unique` when applied on a matrix

When you specify an axis, np.unique returns unique subarrays indexed along this axis. To see is better, assume that one of the rows repeats:

m_sample = np.array([
[1, 2, 1],
[2, 2, 2],
[3, 3, 3],
[1, 4, 5],
[1, 2, 1]
])

In such case np.unique(m_sample, axis=0, return_counts=True) gives:

(array([[1, 2, 1],
[1, 4, 5],
[2, 2, 2],
[3, 3, 3]]),
array([2, 1, 1, 1]))

The first element of this tuple lists unique rows of the array, and the second how many times each row appears in the array. In this example, the row [1, 2, 1] is repeated twice.

To get unique values in each row you can try, for example, the following:

import numpy as np

m_sample = np.array([
[1, 2, 1],
[2, 2, 2],
[3, 3, 3],
[1, 4, 5]
])

s = np.sort(m_sample, axis=1)
mask = np.full(m_sample.shape, True)
mask[:, 1:] = s[:, :-1] != s[:, 1:]
np.split(s[mask], np.cumsum(mask.sum(axis=1)))[:-1]

It gives:

[array([1, 2]), array([2]), array([3]), array([1, 4, 5])]

Most efficient way to calculate frequency of pairs of numbers in a 2D Numpy array

If your elements are not too large nonnegative integers bincount is fast:

from collections import Counter
from itertools import combinations
import numpy as np

def pairs(a):
M = a.max() + 1
a = a.T
return sum(np.bincount((M * a[j] + a[j+1:]).ravel(), None, M*M)
for j in range(len(a) - 1)).reshape(M, M)

def pairs_F_3(a):
M = a.max() + 1
return (np.bincount(a[1:].ravel() + M*a[:2].ravel(), None, M*M) +
np.bincount(a[2].ravel() + M*a[0].ravel(), None, M*M))

def pairs_F(a):
M = a.max() + 1
a = np.ascontiguousarray(a.T) # contiguous columns (rows after .T)
# appear to be typically perform better
# thanks @ning chen
return sum(np.bincount((M * a[j] + a[j+1:]).ravel(), None, M*M)
for j in range(len(a) - 1)).reshape(M, M)

def pairs_dict(a):
p = pairs_F(a)
# p is a 2D table with the frequency of (y, x) at position y, x
y, x = np.where(p)
c = p[y, x]
return {(yi, xi): ci for yi, xi, ci in zip(y, x, c)}

def pair_freq(a, sort=False, sort_axis=-1):
a = np.asarray(a)
if sort:
a = np.sort(a, axis=sort_axis)
res = Counter()
for row in a:
res.update(combinations(row, 2))
return res


from timeit import timeit
A = [np.random.randint(0, 1000, (1000, 120)),
np.random.randint(0, 100, (100000, 12))]
for a in A:
print('shape:', a.shape, 'range:', a.max() + 1)
res2 = pairs_dict(a)
res = pair_freq(a)
print(f'results equal: {res==res2}')
print('bincount', timeit(lambda:pairs(a), number=10)*100, 'ms')
print('bc(F) ', timeit(lambda:pairs_F(a), number=10)*100, 'ms')
print('bc->dict', timeit(lambda:pairs_dict(a), number=10)*100, 'ms')
print('Counter ', timeit(lambda:pair_freq(a), number=4)*250,'ms')

Sample run:

shape: (1000, 120) range: 1000
results equal: True
bincount 461.14772390574217 ms
bc(F) 435.3669326752424 ms
bc->dict 932.1215840056539 ms
Counter 3473.3258984051645 ms
shape: (100000, 12) range: 100
results equal: True
bincount 89.80463854968548 ms
bc(F) 43.449611216783524 ms
bc->dict 46.470773220062256 ms
Counter 1987.6734036952257 ms

Combine counts from multiple numpy.uniques

You can simply apply np.unique to get the array with all the unique values and get at the same time the location for each item in the sorted array. Then you can accumulate the number of items based on the previous index so to get the merged number of item.

all_unique_values, index = np.unique(multi_unique_values, return_inverse=True)
all_unique_counts= np.zeros(all_unique_values.size, np.int64)
np.add.at(all_unique_counts, index, multi_unique_counts.ravel()) # inplace
all_unique_counts

Efficiently counting number of unique elements - NumPy / Python

Here's a method that works for an array with dtype np.uint8 that is faster than np.unique.

First, create an array to work with:

In [128]: a = np.random.randint(1, 128, size=(10, 3000, 3000)).astype(np.uint8)

For later comparison, find the unique values using np.unique:

In [129]: u = np.unique(a)

Here's the faster method; v will contain the result:

In [130]: q = np.zeros(256, dtype=int)

In [131]: q[a.ravel()] = 1

In [132]: v = np.nonzero(q)[0]

Verify that we got the same result:

In [133]: np.array_equal(u, v)
Out[133]: True

Timing:

In [134]: %timeit u = np.unique(a)
2.86 s ± 9.02 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [135]: %timeit q = np.zeros(256, dtype=int); q[a.ravel()] = 1; v = np.nonzero(q)
300 ms ± 5.52 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

So 2.86 seconds for np.unique(), and 0.3 seconds for the alternative method.

efficient way to count frequency of 2D numpy array

You could use scipy.stats.itemfreq

from scipy.stats import itemfreq
itemfreq(r_nm)

Screenshot of the ouput

np.unique works too

import numpy as np
(unique, counts) = np.unique(r_nm, return_counts=True)
frequencies = np.asarray((unique, counts)).T

Screenshot of Output with np.unique



Related Topics



Leave a reply



Submit