Count duplicates between 2 lists
Shorter way and better:
>>> a = [1, 2, 9, 5, 1]
>>> b = [9, 8, 7, 6, 5]
>>> len(set(a) & set(b)) # & is intersection - elements common to both
2
Why your code doesn't work:
>>> def filter_(x, y):
... count = 0
... for num in y:
... if num in x:
... count += 1
... return count
...
>>> filter_(a, b)
2
Your return count
was inside the for loop and it returned without execution being complete.
Best way to count duplicates between 2 lists
You can use collections.Counter
which support &
operation:
>>> from collections import Counter
>>> Counter([1,2,2,4]) & Counter([2,2,3,4]) # {2:2, 1:1, 4:1} AND {2:2, 3:1, 4:1}
Counter({2: 2, 4: 1})
>>> sum(_.values())
3
from collections import Counter
def another(lst1, lst2):
return sum((Counter(lst1) & Counter(lst2)).values())
UPDATE
Here's the modified version1
. You don't need to convert set
back to list
to access items using indexes; Just iterate items:
def version1_modified(lst1, lst2):
return sum(min(lst1.count(x), lst2.count(x)) for x in set(lst1))
Python 3 - counting matches in two lists (including duplicates)
You're seeing this problem because of you're using sets for your collection type. Sets have two characteristics: they're unordered (which doesn't matter here), and their elements are unique. So you're losing the duplicates in the lists when you convert them to sets, before you even find their intersection:
>>> p = ['1', '2', '3', '3', '3', '3', '3']
>>> set(p)
set(['1', '2', '3'])
There are several ways you can do what you're looking to do here, but you'll want to start by looking at the list count
method. I would do something like this:
>>> list1 = ['a', 'b', 'c']
>>> list2 = ['a', 'b', 'c', 'c', 'c']
>>> results = {}
>>> for i in list1:
results[i] = list2.count(i)
>>> results
{'a': 1, 'c': 3, 'b': 1}
This approach creates a dictionary (results
), and for each element in list1
, creates a key in results
, counts the times it occurs in list2
, and assigns that to the key's value.
Edit: As Lattyware points out, that approach solves a slightly different question than the one you asked. A really fundamental solution would look like this
>>> words = ['red', 'blue', 'yellow', 'black']
>>> list1 = ['the', 'black', 'dog']
>>> list2 = ['the', 'blue', 'blue', 'dog']
>>> results1 = 0
>>> results2 = 0
>>> for w in words:
results1 += list1.count(w)
results2 += list2.count(w)
>>> results1
1
>>> results2
2
This works in a similar way to my first suggestion: it iterates through each word in your main list (here I use words
), adds the number of times it appears in list1
to the counter results1
, and list2
to results2
.
If you need more information than just the number of duplicates, you'll want to use a dictionary or, even better, the specialized Counter
type in the collections
modules. Counter is built to make everything I did in the examples above easy.
>>> from collections import Counter
>>> results3 = Counter()
>>> for w in words:
results3[w] = list2.count(w)
>>> results3
Counter({'blue': 2, 'black': 0, 'yellow': 0, 'red': 0})
>>> sum(results3.values())
2
Comparing Python nested lists and count duplicates
dict_a = {row: 0 for row in list_a}
for row in list_b:
if row in dict_a:
dict_a[row] += 1
result = [row + (dict_a[row],) for row in list_a]
On Python 2.6 use dict((row, 0) for row in list_a)
instead of the dictionary comprehension.
Python count duplicates over lists in list
>>> from collections import Counter
>>> ct = Counter([jtem for item in a for jtem in item])
>>> ct
Counter({2: 3, 1: 2, 3: 2, 4: 1, 5: 1})
OR
>>> from itertools import chain
>>> from collections import Counter
>>>
>>> ct = Counter(chain.from_iterable(a))
>>> ct
Counter({2: 3, 1: 2, 3: 2, 4: 1, 5: 1})
This should help you.
Find duplicates between 2 columns (independent order) , count and drop Python
The following approach creates a new column containing a set of the values in the columns specified. The advantage is that all other columns are preserved in the final result. Furthermore, the indices are preserved the same way as in the expected output you posted:
df = pd.DataFrame([['A','B'],['D','B'],['B','A'],['B','C'],['C','B']],
columns=['source', 'target'],)
# Create column with set of both columns
df['tmp'] = df.apply(lambda x: frozenset([x['source'], x['target']]), axis=1)
# Create count column based on new tmp column
df['count'] = df.groupby(['tmp'])['target'].transform('size')
# Drop duplicate rows based on new tmp column
df = df[~df.duplicated(subset='tmp', keep='first')]
# Remove tmp column
df = df.drop('tmp', 1)
df
Output:
source target count
0 A B 2
1 D B 1
3 B C 2
Python: How to count duplicates and compare nested sublist with another nested sublist?
This answer is given according to the output you've provided:
outputs = [[[6224, 'BSC1', 'ST4'], ['LR1'], ['MTM3']], [[4222, 'BSC1', 'ST6'], ['LR1'], ['MTM3']], [[4210, 'BSC1', 'ST1'], ['CR1'], ['TTM2']], [[4210, 'BSC1', 'ST1'], ['CR1'], ['FTM3']], [[5019, 'BSC2', 'ST3'], ['LH1'], ['FTM3']], [[6008, 'BSC3', 'ST1'], ['LB1'], ['WTM1']], [[4201, 'BSC1', 'ST1'], ['LH1'], ['THTM2']], [[4227, 'BSC1', 'ST4'], ['CR1'], ['WTM3']], [[4220, 'BSC2', 'ST5'], ['LH2'], ['THTM2']], [[6226, 'BSC3', 'ST6'], ['CR1'], ['FTM3']], [[6226, 'BSC3', 'ST6'], ['LH1'], ['FTM1']], [[5225, 'BSC2', 'ST6'], ['LB1'], ['THTM3']], [[5201, 'BSC2', 'ST2'], ['LH2'], ['FTM5']], [[4202, 'BSC1', 'ST3'], ['LH1'], ['THTM3']], [[4227, 'BSC1', 'ST4'], ['LH2'], ['THTM2']]]
Question #1: 1st question is: How do I count the number of duplicates in the output.
According to your examples, I assume you're looking for [module, crs, lec]
duplicates:
# I cast tuple in order to be hashable in a set
module_mapper = map(lambda x: tuple(x[0]), outputs)
# Note: you can change the lists to tuples in your class to avoid the casting
# Sets allow only unique elements
unique_modules = set(module_mapper)
# number of duplicates
duplicate_counter = len(xs) - len(unique_modules)
print(duplicate_counter) # result: 3
Question #2: Check if there is a different class at the same time and at the same room
The following is giving a list of different classes which are at the same time and room:
# this is our condition
def filter_condition(x, y):
return x != y and x[1:] == y[1:]
def filterer(classes, acc=[]):
if classes:
c, cs = classes[0], classes[1:]
if c not in acc:
filtered_classes = list(filter(lambda x: filter_condition(c, x), cs))
if filtered_classes:
acc.extend(filtered_classes + [c])
return filterer(cs, acc)
else:
return acc
# results
print(filterer(outputs, []))
# [[[4222, 'BSC1', 'ST6'], ['LR1'], ['MTM3']],
# [[6224, 'BSC1', 'ST4'], ['LR1'], ['MTM3']],
# [[6226, 'BSC3', 'ST6'], ['CR1'], ['FTM3']],
# [[4210, 'BSC1', 'ST1'], ['CR1'], ['FTM3']],
# [[4227, 'BSC1', 'ST4'], ['LH2'], ['THTM2']],
# [[4220, 'BSC2', 'ST5'], ['LH2'], ['THTM2']]]
Final Note: If you use python 10.x
, then you can replace ifs
with match/case
to look cleaner
Related Topics
Why Am I Getting Ioerror: [Errno 13] Permission Denied
How to Read from S3 in Pyspark Running in Local Mode
Defining and Calling a Function Within a Python Class
Create an Array With a Pre Determined Mean and Standard Deviation
Matplotlib Rotate Image File by X Degrees
How to Increment a Variable on a for Loop in Jinja Template
Missing 1 Required Positional Argument - Issue
Python - Using Regex to Find Multiple Matches and Print Them Out
Valueerror: Invalid \Escape Unable to Load Json from File
List of the Most Recently Updated Files in Python
How to Read Image Data from a Url in Python
Python - Returning Nan When Trying to Predict With Keras
Running Command With Paramiko Exec_Command Causes Process to Sleep Before Finishing
Count Number of Empty Array Occurrences Within a 2D Array
Split Large Text File(Around 50Gb) into Multiple Files
How to Change Milliseconds to Seconds in Python