Find unique rows in numpy.array
As of NumPy 1.13, one can simply choose the axis for selection of unique values in any N-dim array. To get unique rows, one can do:
unique_rows = np.unique(original_array, axis=0)
numpy find unique rows (only appeared once)
You can use np.unique
to identify the duplicates:
_, i, c = np.unique(A, axis=0, return_index=True, return_counts=True)
idx = np.isin(np.arange(len(A)), i[c==1])
out = [a[i] for a,i in zip(np.split(A, B.cumsum()[:-1]),
np.split(idx, B.cumsum()[:-1]))]
output:
[array([[ 2, 3, 4],
[ 5, 8, 10],
[ 5, 9, 9]]),
array([[ 7, 9, 6],
[ 9, 2, 4],
[ 9, 3, 6],
[10, 3, 3],
[11, 2, 2]])]
How do I find unique rows in a numpy array with different shaped rows?
If the items are hashable, you could try:
set(tuple(i) for i in array)
which gives:
{('item1', 'item2'),
('item1', 'item2', 'item3'),
('item1', 'item2', 'item3', 'item4')}
Get unique rows from Numpy Array based on a value within the row
You could check where in the array the first value in a row is equal to that of the next row, and index based on the result:
dataA[dataA[:, 0] == np.roll(dataA, -1, axis=0)[:, 0]]
array([[107. , 7.475729 , 6.573791 , 90.0126 , 0.5529882,
0.867588 ],
[108. , 7.838725 , 6.961871 , 89.52572 , 0.5610707,
0.7769735],
[109. , 7.079929 , 6.86194 , 89.6181 , 0.5660294,
0.8596874]])
If the rows are not ordered based on the first value, instead use:
s = dataA[:,0].argsort()
dataA[s][dataA[s, 0] == np.roll(dataA, -1, axis=0)[s, 0]]
For the second example it yields:
array([[107. , 7.475729 , 6.573791 , 90.0126 , 0.5529882,
0.867588 , nan, nan],
[108. , 7.838725 , 6.961871 , 89.52572 , 0.5610707,
0.7769735, nan, nan],
[109. , 7.079929 , 6.86194 , 89.6181 , 0.5660294,
0.8596874, nan, nan],
[110. , 7.727924 , 7.116364 , 90.45003 , 0.5366358,
0.8887361, nan, nan]])
Retain order when taking unique rows in a NumPy array
Using return_index
_,idx=np.unique(stacked, axis=0,return_index=True)
stacked[np.sort(idx)]
array([[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11],
[15, 16, 17],
[18, 19, 20],
[ 4, 5, 5]])
Remove duplicate rows of a numpy array
You can use numpy unique
. Since you want the unique rows, we need to put them into tuples:
import numpy as np
data = np.array([[1,8,3,3,4],
[1,8,9,9,4],
[1,8,3,3,4]])
just applying np.unique
to the data
array will result in this:
>>> uniques
array([1, 3, 4, 8, 9])
prints out the unique elements in the list. So putting them into tuples results in:
new_array = [tuple(row) for row in data]
uniques = np.unique(new_array)
which prints:
>>> uniques
array([[1, 8, 3, 3, 4],
[1, 8, 9, 9, 4]])
UPDATE
In the new version, you need to set np.unique(data, axis=0)
How to get the number of unique array elements in N-dimensional np array?
With lists of lists, you can get the count of rows, by using the axis=0
option (to specify rows) with the numpy.unique()
function and the return_counts=True
option:
>>> a = np.array([(1,2,3),(1,2,3),(3,4,5),(5,6,7)])
>>> np.unique(a, return_counts=True, axis=0)
(array([[1, 2, 3],
[3, 4, 5],
[5, 6, 7]]), array([2, 1, 1]))
The first return values is the unique rows, and the second return value is the counts for those rows. Without the return_counts=True
option, you would only get the first return value. Without the axis=0
option, the whole array would be flattened for the purpose of counting unique elements. axis=0
specifies that rows should be flattened (if they were more than 1D already) and then treated as unique values.
If you can use tuples instead of lists for the rows, then you can use numpy.unique()
with the axis option.
This post explains how to use a list of tuples for a numpy array.
Together, it should look something like this:
>>> l = [(1,2,3),(1,2,3),(3,4,5),(5,6,7)]
>>> a = np.empty(len(l), dtype=object)
>>> a
array([None, None, None, None], dtype=object)
>>> a[:] = l
>>> a
array([(1, 2, 3), (1, 2, 3), (3, 4, 5), (5, 6, 7)], dtype=object)
>>> np.unique(a, return_counts=True)
(array([(1, 2, 3), (3, 4, 5), (5, 6, 7)], dtype=object), array([2, 1, 1]))
Get unique values in a list of numpy arrays
In general, the best option is to use np.unique
method with custom parameters
u, idx, counts = np.unique(X, axis=0, return_index=True, return_counts=True)
Then, according to documentation:
u
is an array of unique arraysidx
is the indices of theX
that give the unique valuescounts
is the number of times each unique item appears inX
If you need a dictionary, you can't store hashable
values in its keys, so you might like to store them as tuples like in @yatu's answer or like this:
dict(zip([tuple(n) for n in u], counts))
Related Topics
How to Explicitly Free Memory in Python
How to Convert a .Py to .Exe For Python
String Count With Overlapping Occurrences
Using @Property Versus Getters and Setters
How to Add Hovering Annotations to a Plot
Manually Raising (Throwing) an Exception in Python
Convert Python Dict into a Dataframe
How to Convert an Integer to a String in Any Base
How to Get the Value of a Variable Given Its Name in a String
How to Create a List of Random Numbers Without Duplicates
Urllib and "Ssl: Certificate_Verify_Failed" Error
How to Execute a String Containing Python Code in Python
How to Create a Tuple With Only One Element
Saving Utf-8 Texts With Json.Dumps as Utf8, Not as \U Escape Sequence
How to Install Pip With Python 3
How to Replace Nans by Preceding or Next Values in Pandas Dataframe