Numpy.Unique with Order Preserved

numpy.unique with order preserved

unique() is slow, O(Nlog(N)), but you can do this by following code:

import numpy as np
a = np.array(['b','a','b','b','d','a','a','c','c'])
_, idx = np.unique(a, return_index=True)
print(a[np.sort(idx)])

output:

['b' 'a' 'd' 'c']

Pandas.unique() is much faster for big array O(N):

import pandas as pd

a = np.random.randint(0, 1000, 10000)
%timeit np.unique(a)
%timeit pd.unique(a)

1000 loops, best of 3: 644 us per loop
10000 loops, best of 3: 144 us per loop

numpy unique without sort

You can do this with the return_index parameter:


>>> import numpy as np
>>> a = [4,2,1,3,1,2,3,4]
>>> np.unique(a)
array([1, 2, 3, 4])
>>> indexes = np.unique(a, return_index=True)[1]
>>> [a[index] for index in sorted(indexes)]
[4, 2, 1, 3]

Retain order when taking unique rows in a NumPy array

Using return_index

_,idx=np.unique(stacked, axis=0,return_index=True)

stacked[np.sort(idx)]
array([[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11],
[15, 16, 17],
[18, 19, 20],
[ 4, 5, 5]])

Baffled by numpy.unique()

From matplotlib documentation, paragraph "Plotting multiple sets of data":

"If x and/or y are 2D arrays a separate data set will be drawn for every column. If both x and y are 2D, they must have the same shape. If only one of them is 2D with shape (N, m) the other must have length N and will be used for every data set m."

It is not explicitly written that all sublists must have the same length. But it only refers to 2D arrays and not ragged nested sequences. To understand the behavior of plt.plot, just imagine that x and y will be cast into numpy arrays. In your second case, since y_lst contains lists with different lengths, this conversion cannot be made.

So I would go for something like this:

plt.figure(figize=(7, 4))
for r in np.linspace(1, 4, 100):
x = np.unique(logistic_calc(r, N))
plt.plot([r], [x], '.', ms=.5, c="royalblue") # a little bit tricky!
# OR
# plt.plot([r] * len(x), x, '.', ms=.5, c="royalblue")

...
plt.show()

numpy.unique has the problem with frozensets

numpy.unique operates by sorting, then collapsing runs of identical elements. Per the doc string:

Returns the sorted unique elements of an array.

The "sorted" part implies it's using a sort-collapse-adjacent technique (similar to what the *NIX sort | uniq pipeline accomplishes).

The problem is that while frozenset does define __lt__ (the overload for <, which most Python sorting algorithms use as their basic building block), it's not using it for the purposes of a total ordering like numbers and sequences use it. It's overloaded to test "is a proper subset of" (not including direct equality). So frozenset({1,2}) < frozenset({3,4}) is False, and so is frozenset({3,4}) > frozenset({1,2}).

Because the expected sort invariant is broken, sorting sequences of set-like objects produces implementation-specific and largely useless results. Uniquifying strategies based on sorting will typically fail under those conditions; one possible result is that it will find the sequence to be sorted in order or reverse order already (since each element is "less than" both the prior and subsequent elements); if it determines it to be in order, nothing changes, if it's in reverse order, it swaps the element order (but in this case that's indistinguishable from preserving order). Then it removes adjacent duplicates (since post-sort, all duplicates should be grouped together), finds none (the duplicates aren't adjacent), and returns the original data.

For frozensets, you probably want to use hash based uniquification, e.g. via set or (to preserve original order of appearance on Python 3.7+), dict.fromkeys; the latter would be simply:

a = [frozenset({1,2}),frozenset({3,4}),frozenset({1,2})]
uniqa = list(dict.fromkeys(a)) # Works on CPython/PyPy 3.6 as implementation detail, and on 3.7+ everywhere

It's also possible to use sort-based uniquification, but numpy.unique doesn't seem to support a key function, so it's easier to stick to Python built-in tools:

from itertools import groupby  # With no key argument, can be used much like uniq command line tool

a = [frozenset({1,2}),frozenset({3,4}),frozenset({1,2})]
uniqa = [k for k, _ in groupby(sorted(a, key=sorted))]

That second line is a little dense, so I'll break it up:

  1. sorted(a, key=sorted) - Returns a new list based on a where each element is sorted based on the sorted list form of the element (so the < comparison actually does put like with like)
  2. groupby(...) returns an iterator of key/group-iterator pairs. With no key argument to groupby, it just means each key is a unique value, and the group-iterator produces that value as many times as it was seen.
  3. [k for k, _ in ...] Since we don't care how many times each duplicate value was seen, so we ignore the group-iterator (assigning to _ means "ignored" by convention), and have the list comprehension produce only the keys (the unique values)

numpy array -- sort descending unique values base count of values

With pure numpy, you can use numpy.unique with return_counts=True, then numpy.argsort:

a = np.array([5,6,7,6,1,9,10,3,1,6])
b, c = np.unique(a, return_counts=True)
out = b[np.argsort(-c)]

output: array([ 6, 1, 3, 5, 7, 9, 10])

Keep unique values in an array based on another array while preserving order

You can use np.lexsort and np.unique -

idx = np.lexsort([distances, my_arr])
out = np.sort(idx[np.unique(my_arr[idx], return_index=1)[1]])


Related Topics



Leave a reply



Submit