How to Efficiently Compare Two Unordered Lists (Not Sets)

How to efficiently compare two unordered lists (not sets)?

O(n): The Counter() method is best (if your objects are hashable):

def compare(s, t):
return Counter(s) == Counter(t)

O(n log n): The sorted() method is next best (if your objects are orderable):

def compare(s, t):
return sorted(s) == sorted(t)

O(n * n): If the objects are neither hashable, nor orderable, you can use equality:

def compare(s, t):
t = list(t) # make a mutable copy
try:
for elem in s:
t.remove(elem)
except ValueError:
return False
return not t

How to compare two lists with different order?

You need to compare whether both lists result in the same Counter.

>>> from collections import Counter
>>> list_a = ['one', 'two', 'three']
>>> list_b = ['three', 'one', 'two']
>>> Counter(list_a) == Counter(list_b)
True
>>> list_b = ['three', 'one', 'two', 'two']
>>> Counter(list_a) == Counter(list_b)
False
>>> set(list_a) == set(list_b)
True # False positive

Another solution would be to sort both lists and then compare them. The Counter approach should be more efficient for big lists since it has linear time complexity (i.e. O(n)) while sorting is only pseudo-linear (i.e. O(n*log(n)).

Check if two unordered lists are equal

Python has a built-in datatype for an unordered collection of (hashable) things, called a set. If you convert both lists to sets, the comparison will be unordered.

set(x) == set(y)

Documentation on set


EDIT: @mdwhatcott points out that you want to check for duplicates. set ignores these, so you need a similar data structure that also keeps track of the number of items in each list. This is called a multiset; the best approximation in the standard library is a collections.Counter:

>>> import collections
>>> compare = lambda x, y: collections.Counter(x) == collections.Counter(y)
>>>
>>> compare([1,2,3], [1,2,3,3])
False
>>> compare([1,2,3], [1,2,3])
True
>>> compare([1,2,3,3], [1,2,2,3])
False
>>>

Determine if 2 lists have the same elements, regardless of order?

You can simply check whether the multisets with the elements of x and y are equal:

import collections
collections.Counter(x) == collections.Counter(y)

This requires the elements to be hashable; runtime will be in O(n), where n is the size of the lists.

If the elements are also unique, you can also convert to sets (same asymptotic runtime, may be a little bit faster in practice):

set(x) == set(y)

If the elements are not hashable, but sortable, another alternative (runtime in O(n log n)) is

sorted(x) == sorted(y)

If the elements are neither hashable nor sortable you can use the following helper function. Note that it will be quite slow (O(n²)) and should generally not be used outside of the esoteric case of unhashable and unsortable elements.

def equal_ignore_order(a, b):
""" Use only when elements are neither hashable nor sortable! """
unmatched = list(b)
for element in a:
try:
unmatched.remove(element)
except ValueError:
return False
return not unmatched

How do I compare two unordered lists of lists with unordered lists inside of them?

You could simply sort the elements in the sublists, sort the sublists and compare the two lists.

>>> dummy_list_A =  [['A'], ['B'], ['C', 'D']]
>>> dummy_list_B = [['B'], ['A'], ['D', 'C']]
>>> sorted(map(sorted, dummy_list_A))
[['A'], ['B'], ['C', 'D']]
>>> sorted(map(sorted, dummy_list_B))
[['A'], ['B'], ['C', 'D']]
>>> def signature(l):
... return sorted(map(sorted, l))
...
>>> signature(dummy_list_A) == signature(dummy_list_B)
True

Compare list of tuples - Equals no matter position

Sort the lists, then compare them:

a = [(1,2,3),(4,5,6)]
b = [(4,5,6),(1,2,3)]
sorted(a)==sorted(b)
# True

Compare unordered list of dynamic dicts

Here's a solution which should work generally, even when the list has multiples of the same dictionary, and when the dictionaries in one list can have common keys. The idea is to convert the dictionaries into a canonical, hashable form, and then compare them as bags (sets which can contain multiples) using Counter.

It does assume that the dictionary keys are comparable and the dictionary values are hashable, so it won't work if you have dictionaries with incomparable keys or unhashable values.

from collections import Counter

def dict_to_canonical_hashable(d):
return tuple(sorted(d.items()))

def unordered_lists_equal(a, b):
canonical_a = Counter(map(dict_to_canonical_hashable, a))
canonical_b = Counter(map(dict_to_canonical_hashable, b))
return canonical_a == canonical_b

Tests:

>>> unordered_lists_equal(dict_list_1, dict_list_2)
True
>>> unordered_lists_equal(dict_list_1, dict_list_3)
False
>>> unordered_lists_equal(dict_list_2, dict_list_3)
False
>>> unordered_lists_equal([{1: 2, 3: 4}, {5: 6}], [{1: 2}, {3: 4, 5: 6}])
False
>>> unordered_lists_equal([{1: 2}, {1: 3}], [{1: 3}, {1: 2}])
True
>>> unordered_lists_equal([{1: 2}, {1: 2}], [{1: 2}])
False
>>> unordered_lists_equal([{1: 2}, {1: 2}], [{1: 2}, {1: 2}])
True

Best method to check whether the contents of two lists are same?

You already have one of the better methods to determine if the content in both lists is identical.

If your condition is that the content must be the same, but the order is optional, then using sort() and comparing them is a perfectly good solution.

Or you could do a method that does not involve sorting both lists and then also comparing them. This assumes the lists contain ints. But something similar could be done for other data types.

Using Counter you do not need to sort them, and you can make sure they have the same number of each element.

>>> from collections import Counter
>>> a = [1,2,3,4]
>>> b = [4,3,2,1]
>>> Counter(a) == Counter(b)
True

Pythonic Way to compare two unordered lists by attributes

It is easy, you should use sets:

if set(list1).difference(set(list2)):
# lists are different
# different_items = set(list1).difference(set(list2))
pass
else:
# lists are the same
pass

You can convert your structure to iterables or lists:

list1 = [(i.CRC, basename(i.filename)) for i in old.infolist()]
list2 = [(i.CRC, basename(i.filename)) for i in new.infolist()]

How to efficiently compare two sets of unordered vectors (`np.ndarray`)?

Here's a function that works using np.lexsort:

def is_symmetric(A, R, *args, **kwargs):
A = np.asanyarray(A)
A = A[np.lexsort(A.T)]

A_t = A @ np.asanyarray(R).T
A_t = A_t[np.lexsort(A_t.T)]
return np.allclose(A, A_t, *args, **kwargs)

Some results:

R = np.array([[0, -1, 0],  # 90° rotation as an example
[1, 0, 0],
[0, 0, 1]])

is_symmetric([[0, 0, 0]], R)
# True
is_symmetric([[1, 0, 0],
[0, 1, 0],
[0, 0, 0]], R)
# False
is_symmetric([[1, 0, 0],
[0, 1, 0],
[0, 0, 0],
[-1, 0, 0]], R)
# False
is_symmetric([[1, 0, 0],
[0, 1, 0],
[0, 0, 0],
[-1, 0, 0],
[0, -1, 0]], R)
# True

Performance seems fine for 100000 random vectors:

A = np.random.rand(100000, 3)
%timeit is_symmetric(A, R)
# 82.2 ms ± 75.4 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)


Related Topics



Leave a reply



Submit