Difference Between Two Lists with Duplicates in Python

Difference Between Two Lists with Duplicates in Python

You didn't specify if the order matters. If it does not, you can do this in >= Python 2.7:

l1 = ['a', 'b', 'c', 'b', 'c']
l2 = ['a', 'b', 'c', 'b']

from collections import Counter

c1 = Counter(l1)
c2 = Counter(l2)

diff = c1-c2
print list(diff.elements())

Get difference between two lists with Unique Entries

To get elements which are in temp1 but not in temp2 (assuming uniqueness of the elements in each list):

In [5]: list(set(temp1) - set(temp2))
Out[5]: ['Four', 'Three']

Beware that it is asymmetric :

In [5]: set([1, 2]) - set([2, 3])
Out[5]: set([1])

where you might expect/want it to equal set([1, 3]). If you do want set([1, 3]) as your answer, you can use set([1, 2]).symmetric_difference(set([2, 3])).

How can I compare two lists in python and return matches

Not the most efficient one, but by far the most obvious way to do it is:

>>> a = [1, 2, 3, 4, 5]
>>> b = [9, 8, 7, 6, 5]
>>> set(a) & set(b)
{5}

if order is significant you can do it with list comprehensions like this:

>>> [i for i, j in zip(a, b) if i == j]
[5]

(only works for equal-sized lists, which order-significance implies).

Intersection of two lists including duplicates?

You can use collections.Counter for this, which will provide the lowest count found in either list for each element when you take the intersection.

from collections import Counter

c = list((Counter(a) & Counter(b)).elements())

Outputs:

[1, 1, 2, 3, 4]

How to compare 2 lists and remove duplicates from 1 efficiently?

Convert to set, remove elements, then convert back to list.

s1 = set(array1)
s2 = set(array2)
array2 = list(s2.difference(s1))

Edit: To keep track of duplicates, you can use collections.Counter and reconstruct the list.

from collections import Counter

s1 = set(array1)
array2 = [x for x in array2 if x not in s1]
# d2 = Counter(array2)
# array2 = [z for k, v in d2.items() if k not in s1 for z in [k] * v]

EDIT2: I thought using Counter would be faster, but the secondary list construction in the comprehension seems to nullify any gains. You are better off just making the first set, then using that for existence checks.

Tests: Counter and double comprehension

%%timeit
array1 = [random.randint(0, 10000) for _ in range(200000)]
array2 = [random.randint(0, 20000) for _ in range(200000)]
s1 = set(array1)
d2 = Counter(array2)
[z for k, v in d2.items() if k not in s1 for z in [k]*v]

# returns:
525 ms ± 19.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Test: single comprehension with existence check

%%timeit
array1 = [random.randint(0, 10000) for _ in range(200000)]
array2 = [random.randint(0, 20000) for _ in range(200000)]
s1 = set(array1)
#d2 = Counter(array2)
[x for x in array1 if x not in s1]

# returns:
510 ms ± 17.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Compare two lists of floats where order and duplicates matter in Python

If I understand correctly, this should work:

sum(a != b for a, b in zip(listA, listB))

Gives expected output of 2.

Note that because your problem description states that order is important, sets will be no use here as they are not ordered.

How do i subtract two lists with non-unique elements in Python?

If the order is not important, you can make Counters from the lists and subtract them.

from collections import Counter

list1 = ['a', 'c', 'a', 'b']
list2 = ['a', 'a', 'a', 'a', 'b', 'c', 'c', 'd', 'e', 'f']

final = Counter(list2) - Counter(list1)

print(list(final.elements())) # -> ['a', 'a', 'c', 'd', 'e', 'f']

It's being used as a multiset.

There are some caveats to "order is not important", like the fact that dicts in Python 3.7+ will preserve insertion order, hence why the output here is ordered.



Related Topics



Leave a reply



Submit