Find Unique Rows in Numpy.Array

Find unique rows in numpy.array

As of NumPy 1.13, one can simply choose the axis for selection of unique values in any N-dim array. To get unique rows, one can do:

unique_rows = np.unique(original_array, axis=0)

numpy find unique rows (only appeared once)

You can use np.unique to identify the duplicates:

_, i, c = np.unique(A, axis=0, return_index=True, return_counts=True)

idx = np.isin(np.arange(len(A)), i[c==1])

out = [a[i] for a,i in zip(np.split(A, B.cumsum()[:-1]),
                           np.split(idx, B.cumsum()[:-1]))]

output:

[array([[ 2,  3,  4],
        [ 5,  8, 10],
        [ 5,  9,  9]]),
 array([[ 7,  9,  6],
        [ 9,  2,  4],
        [ 9,  3,  6],
        [10,  3,  3],
        [11,  2,  2]])]

How do I find unique rows in a numpy array with different shaped rows?

If the items are hashable, you could try:

set(tuple(i) for i in array)

which gives:

{('item1', 'item2'),
 ('item1', 'item2', 'item3'),
 ('item1', 'item2', 'item3', 'item4')}

Get unique rows from Numpy Array based on a value within the row

You could check where in the array the first value in a row is equal to that of the next row, and index based on the result:

dataA[dataA[:, 0] == np.roll(dataA, -1, axis=0)[:, 0]]

array([[107.       ,   7.475729 ,   6.573791 ,  90.0126   ,   0.5529882,
          0.867588 ],
       [108.       ,   7.838725 ,   6.961871 ,  89.52572  ,   0.5610707,
          0.7769735],
       [109.       ,   7.079929 ,   6.86194  ,  89.6181   ,   0.5660294,
          0.8596874]])

If the rows are not ordered based on the first value, instead use:

s = dataA[:,0].argsort()
dataA[s][dataA[s, 0] == np.roll(dataA, -1, axis=0)[s, 0]]

For the second example it yields:

array([[107.       ,   7.475729 ,   6.573791 ,  90.0126   ,   0.5529882,
          0.867588 ,         nan,         nan],
       [108.       ,   7.838725 ,   6.961871 ,  89.52572  ,   0.5610707,
          0.7769735,         nan,         nan],
       [109.       ,   7.079929 ,   6.86194  ,  89.6181   ,   0.5660294,
          0.8596874,         nan,         nan],
       [110.       ,   7.727924 ,   7.116364 ,  90.45003  ,   0.5366358,
          0.8887361,         nan,         nan]])

Retain order when taking unique rows in a NumPy array

Using return_index

_,idx=np.unique(stacked, axis=0,return_index=True)

stacked[np.sort(idx)]
array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [ 9, 10, 11],
       [15, 16, 17],
       [18, 19, 20],
       [ 4,  5,  5]])

Remove duplicate rows of a numpy array

You can use numpy unique. Since you want the unique rows, we need to put them into tuples:

import numpy as np

data = np.array([[1,8,3,3,4],
                 [1,8,9,9,4],
                 [1,8,3,3,4]])

just applying np.unique to the data array will result in this:

>>> uniques
array([1, 3, 4, 8, 9])

prints out the unique elements in the list. So putting them into tuples results in:

new_array = [tuple(row) for row in data]
uniques = np.unique(new_array)

which prints:

>>> uniques
array([[1, 8, 3, 3, 4],
       [1, 8, 9, 9, 4]])

UPDATE

In the new version, you need to set np.unique(data, axis=0)

How to get the number of unique array elements in N-dimensional np array?

With lists of lists, you can get the count of rows, by using the axis=0 option (to specify rows) with the numpy.unique() function and the return_counts=True option:

>>> a = np.array([(1,2,3),(1,2,3),(3,4,5),(5,6,7)])
>>> np.unique(a, return_counts=True, axis=0)
(array([[1, 2, 3],
       [3, 4, 5],
       [5, 6, 7]]), array([2, 1, 1]))

The first return values is the unique rows, and the second return value is the counts for those rows. Without the return_counts=True option, you would only get the first return value. Without the axis=0 option, the whole array would be flattened for the purpose of counting unique elements. axis=0 specifies that rows should be flattened (if they were more than 1D already) and then treated as unique values.

If you can use tuples instead of lists for the rows, then you can use numpy.unique() with the axis option.

This post explains how to use a list of tuples for a numpy array.

Together, it should look something like this:

>>> l = [(1,2,3),(1,2,3),(3,4,5),(5,6,7)]
>>> a = np.empty(len(l), dtype=object)
>>> a
array([None, None, None, None], dtype=object)
>>> a[:] = l
>>> a
array([(1, 2, 3), (1, 2, 3), (3, 4, 5), (5, 6, 7)], dtype=object)
>>> np.unique(a, return_counts=True)
(array([(1, 2, 3), (3, 4, 5), (5, 6, 7)], dtype=object), array([2, 1, 1]))

Get unique values in a list of numpy arrays

In general, the best option is to use np.unique method with custom parameters

u, idx, counts = np.unique(X, axis=0, return_index=True, return_counts=True)

Then, according to documentation:

u is an array of unique arrays
idx is the indices of the X that give the unique values
counts is the number of times each unique item appears in X

If you need a dictionary, you can't store hashable values in its keys, so you might like to store them as tuples like in @yatu's answer or like this:

dict(zip([tuple(n) for n in u], counts))

Find Unique Rows in Numpy.Array