Count Duplicates Between 2 Lists

Count duplicates between 2 lists

Shorter way and better:

>>> a = [1, 2, 9, 5, 1]
>>> b = [9, 8, 7, 6, 5]
>>> len(set(a) & set(b)) # & is intersection - elements common to both
2

Why your code doesn't work:

>>> def filter_(x, y):
... count = 0
... for num in y:
... if num in x:
... count += 1
... return count
...
>>> filter_(a, b)
2

Your return count was inside the for loop and it returned without execution being complete.

Best way to count duplicates between 2 lists

You can use collections.Counter which support & operation:

>>> from collections import Counter
>>> Counter([1,2,2,4]) & Counter([2,2,3,4]) # {2:2, 1:1, 4:1} AND {2:2, 3:1, 4:1}
Counter({2: 2, 4: 1})
>>> sum(_.values())
3

from collections import Counter
def another(lst1, lst2):
return sum((Counter(lst1) & Counter(lst2)).values())

UPDATE

Here's the modified version1. You don't need to convert set back to list to access items using indexes; Just iterate items:

def version1_modified(lst1, lst2):
return sum(min(lst1.count(x), lst2.count(x)) for x in set(lst1))

Python 3 - counting matches in two lists (including duplicates)

You're seeing this problem because of you're using sets for your collection type. Sets have two characteristics: they're unordered (which doesn't matter here), and their elements are unique. So you're losing the duplicates in the lists when you convert them to sets, before you even find their intersection:

>>> p = ['1', '2', '3', '3', '3', '3', '3']
>>> set(p)
set(['1', '2', '3'])

There are several ways you can do what you're looking to do here, but you'll want to start by looking at the list count method. I would do something like this:

>>> list1 = ['a', 'b', 'c']
>>> list2 = ['a', 'b', 'c', 'c', 'c']
>>> results = {}
>>> for i in list1:
results[i] = list2.count(i)
>>> results
{'a': 1, 'c': 3, 'b': 1}

This approach creates a dictionary (results), and for each element in list1, creates a key in results, counts the times it occurs in list2, and assigns that to the key's value.

Edit: As Lattyware points out, that approach solves a slightly different question than the one you asked. A really fundamental solution would look like this

>>> words = ['red', 'blue', 'yellow', 'black']
>>> list1 = ['the', 'black', 'dog']
>>> list2 = ['the', 'blue', 'blue', 'dog']
>>> results1 = 0
>>> results2 = 0
>>> for w in words:
results1 += list1.count(w)
results2 += list2.count(w)

>>> results1
1
>>> results2
2

This works in a similar way to my first suggestion: it iterates through each word in your main list (here I use words), adds the number of times it appears in list1 to the counter results1, and list2 to results2.

If you need more information than just the number of duplicates, you'll want to use a dictionary or, even better, the specialized Counter type in the collections modules. Counter is built to make everything I did in the examples above easy.

>>> from collections import Counter
>>> results3 = Counter()
>>> for w in words:
results3[w] = list2.count(w)

>>> results3
Counter({'blue': 2, 'black': 0, 'yellow': 0, 'red': 0})
>>> sum(results3.values())
2

Comparing Python nested lists and count duplicates

dict_a = {row: 0 for row in list_a}
for row in list_b:
if row in dict_a:
dict_a[row] += 1

result = [row + (dict_a[row],) for row in list_a]

On Python 2.6 use dict((row, 0) for row in list_a) instead of the dictionary comprehension.

Python count duplicates over lists in list

>>> from collections import Counter
>>> ct = Counter([jtem for item in a for jtem in item])
>>> ct
Counter({2: 3, 1: 2, 3: 2, 4: 1, 5: 1})

OR

>>> from itertools import chain
>>> from collections import Counter
>>>
>>> ct = Counter(chain.from_iterable(a))
>>> ct
Counter({2: 3, 1: 2, 3: 2, 4: 1, 5: 1})

This should help you.

Find duplicates between 2 columns (independent order) , count and drop Python

The following approach creates a new column containing a set of the values in the columns specified. The advantage is that all other columns are preserved in the final result. Furthermore, the indices are preserved the same way as in the expected output you posted:

df = pd.DataFrame([['A','B'],['D','B'],['B','A'],['B','C'],['C','B']],
columns=['source', 'target'],)

# Create column with set of both columns
df['tmp'] = df.apply(lambda x: frozenset([x['source'], x['target']]), axis=1)

# Create count column based on new tmp column
df['count'] = df.groupby(['tmp'])['target'].transform('size')

# Drop duplicate rows based on new tmp column
df = df[~df.duplicated(subset='tmp', keep='first')]

# Remove tmp column
df = df.drop('tmp', 1)

df

Output:

    source  target  count
0 A B 2
1 D B 1
3 B C 2

Python: How to count duplicates and compare nested sublist with another nested sublist?

This answer is given according to the output you've provided:

outputs = [[[6224, 'BSC1', 'ST4'], ['LR1'], ['MTM3']], [[4222, 'BSC1', 'ST6'], ['LR1'], ['MTM3']], [[4210, 'BSC1', 'ST1'], ['CR1'], ['TTM2']], [[4210, 'BSC1', 'ST1'], ['CR1'], ['FTM3']], [[5019, 'BSC2', 'ST3'], ['LH1'], ['FTM3']], [[6008, 'BSC3', 'ST1'], ['LB1'], ['WTM1']], [[4201, 'BSC1', 'ST1'], ['LH1'], ['THTM2']], [[4227, 'BSC1', 'ST4'], ['CR1'], ['WTM3']], [[4220, 'BSC2', 'ST5'], ['LH2'], ['THTM2']], [[6226, 'BSC3', 'ST6'], ['CR1'], ['FTM3']], [[6226, 'BSC3', 'ST6'], ['LH1'], ['FTM1']], [[5225, 'BSC2', 'ST6'], ['LB1'], ['THTM3']], [[5201, 'BSC2', 'ST2'], ['LH2'], ['FTM5']], [[4202, 'BSC1', 'ST3'], ['LH1'], ['THTM3']], [[4227, 'BSC1', 'ST4'], ['LH2'], ['THTM2']]]

Question #1: 1st question is: How do I count the number of duplicates in the output.

According to your examples, I assume you're looking for [module, crs, lec] duplicates:

# I cast tuple in order to be hashable in a set
module_mapper = map(lambda x: tuple(x[0]), outputs)
# Note: you can change the lists to tuples in your class to avoid the casting

# Sets allow only unique elements
unique_modules = set(module_mapper)

# number of duplicates
duplicate_counter = len(xs) - len(unique_modules)


print(duplicate_counter) # result: 3

Question #2: Check if there is a different class at the same time and at the same room

The following is giving a list of different classes which are at the same time and room:

# this is our condition
def filter_condition(x, y):
return x != y and x[1:] == y[1:]


def filterer(classes, acc=[]):
if classes:
c, cs = classes[0], classes[1:]
if c not in acc:
filtered_classes = list(filter(lambda x: filter_condition(c, x), cs))
if filtered_classes:
acc.extend(filtered_classes + [c])
return filterer(cs, acc)
else:
return acc

# results

print(filterer(outputs, []))
# [[[4222, 'BSC1', 'ST6'], ['LR1'], ['MTM3']],
# [[6224, 'BSC1', 'ST4'], ['LR1'], ['MTM3']],
# [[6226, 'BSC3', 'ST6'], ['CR1'], ['FTM3']],
# [[4210, 'BSC1', 'ST1'], ['CR1'], ['FTM3']],
# [[4227, 'BSC1', 'ST4'], ['LH2'], ['THTM2']],
# [[4220, 'BSC2', 'ST5'], ['LH2'], ['THTM2']]]

Final Note: If you use python 10.x, then you can replace ifs with match/case to look cleaner



Related Topics



Leave a reply



Submit