numpy.unique with order preserved
unique()
is slow, O(Nlog(N)), but you can do this by following code:
import numpy as np
a = np.array(['b','a','b','b','d','a','a','c','c'])
_, idx = np.unique(a, return_index=True)
print(a[np.sort(idx)])
output:
['b' 'a' 'd' 'c']
Pandas.unique()
is much faster for big array O(N):
import pandas as pd
a = np.random.randint(0, 1000, 10000)
%timeit np.unique(a)
%timeit pd.unique(a)
1000 loops, best of 3: 644 us per loop
10000 loops, best of 3: 144 us per loop
numpy unique without sort
You can do this with the return_index
parameter:
>>> import numpy as np
>>> a = [4,2,1,3,1,2,3,4]
>>> np.unique(a)
array([1, 2, 3, 4])
>>> indexes = np.unique(a, return_index=True)[1]
>>> [a[index] for index in sorted(indexes)]
[4, 2, 1, 3]
Retain order when taking unique rows in a NumPy array
Using return_index
_,idx=np.unique(stacked, axis=0,return_index=True)
stacked[np.sort(idx)]
array([[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11],
[15, 16, 17],
[18, 19, 20],
[ 4, 5, 5]])
Baffled by numpy.unique()
From matplotlib documentation, paragraph "Plotting multiple sets of data":
"If x and/or y are 2D arrays a separate data set will be drawn for every column. If both x and y are 2D, they must have the same shape. If only one of them is 2D with shape (N, m) the other must have length N and will be used for every data set m."
It is not explicitly written that all sublists must have the same length. But it only refers to 2D arrays and not ragged nested sequences. To understand the behavior of plt.plot
, just imagine that x
and y
will be cast into numpy arrays. In your second case, since y_lst
contains lists with different lengths, this conversion cannot be made.
So I would go for something like this:
plt.figure(figize=(7, 4))
for r in np.linspace(1, 4, 100):
x = np.unique(logistic_calc(r, N))
plt.plot([r], [x], '.', ms=.5, c="royalblue") # a little bit tricky!
# OR
# plt.plot([r] * len(x), x, '.', ms=.5, c="royalblue")
...
plt.show()
numpy.unique has the problem with frozensets
numpy.unique
operates by sorting, then collapsing runs of identical elements. Per the doc string:
Returns the sorted unique elements of an array.
The "sorted" part implies it's using a sort-collapse-adjacent technique (similar to what the *NIX sort | uniq
pipeline accomplishes).
The problem is that while frozenset
does define __lt__
(the overload for <
, which most Python sorting algorithms use as their basic building block), it's not using it for the purposes of a total ordering like numbers and sequences use it. It's overloaded to test "is a proper subset of" (not including direct equality). So frozenset({1,2}) < frozenset({3,4})
is False
, and so is frozenset({3,4}) > frozenset({1,2})
.
Because the expected sort invariant is broken, sorting sequences of set
-like objects produces implementation-specific and largely useless results. Uniquifying strategies based on sorting will typically fail under those conditions; one possible result is that it will find the sequence to be sorted in order or reverse order already (since each element is "less than" both the prior and subsequent elements); if it determines it to be in order, nothing changes, if it's in reverse order, it swaps the element order (but in this case that's indistinguishable from preserving order). Then it removes adjacent duplicates (since post-sort, all duplicates should be grouped together), finds none (the duplicates aren't adjacent), and returns the original data.
For frozenset
s, you probably want to use hash based uniquification, e.g. via set
or (to preserve original order of appearance on Python 3.7+), dict.fromkeys
; the latter would be simply:
a = [frozenset({1,2}),frozenset({3,4}),frozenset({1,2})]
uniqa = list(dict.fromkeys(a)) # Works on CPython/PyPy 3.6 as implementation detail, and on 3.7+ everywhere
It's also possible to use sort-based uniquification, but numpy.unique
doesn't seem to support a key
function, so it's easier to stick to Python built-in tools:
from itertools import groupby # With no key argument, can be used much like uniq command line tool
a = [frozenset({1,2}),frozenset({3,4}),frozenset({1,2})]
uniqa = [k for k, _ in groupby(sorted(a, key=sorted))]
That second line is a little dense, so I'll break it up:
sorted(a, key=sorted)
- Returns a newlist
based ona
where each element is sorted based on the sortedlist
form of the element (so the<
comparison actually does put like with like)groupby(...)
returns an iterator of key/group-iterator pairs. With nokey
argument togroupby
, it just means each key is a unique value, and the group-iterator produces that value as many times as it was seen.[k for k, _ in ...]
Since we don't care how many times each duplicate value was seen, so we ignore the group-iterator (assigning to_
means "ignored" by convention), and have the list comprehension produce only the keys (the unique values)
numpy array -- sort descending unique values base count of values
With pure numpy, you can use numpy.unique
with return_counts=True
, then numpy.argsort
:
a = np.array([5,6,7,6,1,9,10,3,1,6])
b, c = np.unique(a, return_counts=True)
out = b[np.argsort(-c)]
output: array([ 6, 1, 3, 5, 7, 9, 10])
Keep unique values in an array based on another array while preserving order
You can use np.lexsort
and np.unique
-
idx = np.lexsort([distances, my_arr])
out = np.sort(idx[np.unique(my_arr[idx], return_index=1)[1]])
Related Topics
Pandas Index Column Title or Name
How to Get Most Informative Features for Scikit-Learn Classifiers
What Is the Syntax to Insert One List into Another List in Python
Cast Base Class to Derived Class Python (Or More Pythonic Way of Extending Classes)
Easy Way of Finding Decimal Places
How to Flatten Lists Without Splitting Strings
How to Upgrade to Python 3.6 with Conda
Running Interactive Commands in Paramiko
Intercepting Stdout of a Subprocess While It Is Running
How to Set Window Size in Selenium Chrome Python
Adding a Legend to Pyplot in Matplotlib in the Simplest Manner Possible
How to List Pip Dependencies/Requirements
Download Image with Selenium Python
Asyncio.Sleep() VS Time.Sleep()