How to Find Duplicate Elements in Array Using for Loop in Python

Duplicates in a sequence using ONLY for loops in Python

To do that, we iterate over the current list checking for duplicates from the 2nd element onwards and seeing if that particular element exists anywhere at the beginning of the list.

To get the sub-list starting from index 0 uptill the current index we use the slice[:] operation in list and to check if the current element exists in that sub-list, we use the in operator.

You can do the following:

In [1]: def has_duplicates(my_list):
...: for index in range(len(my_list)):
...: if index!=0: # check only from 2nd element onwards
...: item = my_list[index] # get the current item
...: if item in my_list[:index]: # check if current item is in the beginning of the list uptill the current element
...: return True # duplicate exist
...: return False # duplicate does not exist
...:

In [2]: has_duplicates([1,2,3,4])
Out[2]: False

In [3]: has_duplicates([1,2,3,4,4])
Out[3]: True

In [4]: has_duplicates([1,1,2,3,4,4])
Out[4]: True

How do I find the duplicates in a list and create another list with them?

To remove duplicates use set(a). To print duplicates, something like:

a = [1,2,3,2,1,5,6,5,5,5]

import collections
print([item for item, count in collections.Counter(a).items() if count > 1])

## [1, 2, 5]

Note that Counter is not particularly efficient (timings) and probably overkill here. set will perform better. This code computes a list of unique elements in the source order:

seen = set()
uniq = []
for x in a:
if x not in seen:
uniq.append(x)
seen.add(x)

or, more concisely:

seen = set()
uniq = [x for x in a if x not in seen and not seen.add(x)]

I don't recommend the latter style, because it is not obvious what not seen.add(x) is doing (the set add() method always returns None, hence the need for not).

To compute the list of duplicated elements without libraries:

seen = set()
dupes = []

for x in a:
if x in seen:
dupes.append(x)
else:
seen.add(x)

or, more concisely:

seen = set()
dupes = [x for x in a if x in seen or seen.add(x)]

If list elements are not hashable, you cannot use sets/dicts and have to resort to a quadratic time solution (compare each with each). For example:

a = [[1], [2], [3], [1], [5], [3]]

no_dupes = [x for n, x in enumerate(a) if x not in a[:n]]
print no_dupes # [[1], [2], [3], [5]]

dupes = [x for n, x in enumerate(a) if x in a[:n]]
print dupes # [[1], [3]]

Python: Iterate through Big Array with BigInts, find first duplicate and printout the indexes of the duplicate Values

You can do something along these lines in Python:

Assume this list of 5 signatures (they could be ints or strings, but I have strings):

li=['864205495604807476120572616017955259175325408501',
'864205495604807476120572616017955259175325408502',
'864205495604807476120572616017955259175325408503',
'864205495604807476120572616017955259175325408501',
'864205495604807476120572616017955259175325408502']

You can make a dict of lists with each list being the index of duplicates:

idx={}
for i, sig in enumerate(li):
idx.setdefault(sig, []).append(i)

If you make li 3,000,000 entries, that runs in about 550 milliseconds on my computer and likely would be similar on yours.

You can then find the duplicates like so:

for sig, v in idx.items():
if len(v)>1:
print(f'{sig}: {v}')

Prints:

864205495604807476120572616017955259175325408501: [0, 3]
864205495604807476120572616017955259175325408502: [1, 4]

If you just want the FIRST duplicate, you can modify the first loop like so:

idx={}
for i, sig in enumerate(li):
if sig in idx:
print(f'Duplicate {sig} at {idx[sig]} and {i}')
break
else:
idx[sig]=i

Prints:

Duplicate 864205495604807476120572616017955259175325408501 at 0 and 3

But to be honest - i dont understand why this is so much faster.

Yours is super slow because it has O n**2 complexity from nested while loops. You are looping over the entire array for each element. The method I showed you here is only looping once over the entire list -- not 3 million times!

Identify duplicate values in a list in Python

These answers are O(n), so a little more code than using mylist.count() but much more efficient as mylist gets longer

If you just want to know the duplicates, use collections.Counter

from collections import Counter
mylist = [20, 30, 25, 20]
[k for k,v in Counter(mylist).items() if v>1]

If you need to know the indices,

from collections import defaultdict
D = defaultdict(list)
for i,item in enumerate(mylist):
D[item].append(i)
D = {k:v for k,v in D.items() if len(v)>1}

Python: Iterate through list and remove duplicates (without using Set())

The issue is with "automatic" for loops - you have to be careful about using them when modifying that which you are iterating through. Here's the proper solution:

def remove_dup(a):
i = 0
while i < len(a):
j = i + 1
while j < len(a):
if a[i] == a[j]:
del a[j]
else:
j += 1
i += 1

s = ['cat','dog','cat','mouse','dog']
remove_dup(s)
print(s)

Output: ['cat', 'dog', 'mouse']

This solution is in-place, modifying the original array rather than creating a new one. It also doesn't use any extra data structures.

Find duplicate in Array with single loop

Use any Set implementation, say HashSet<T>, e.g.

HashSet<int> hs = new HashSet<int>();
int[] Arr = { 9, 5, 6, 3, 8, 2, 5, 1, 7, 4 };

foreach (item in Arr)
if (hs.Contains(item)) {
Console.WriteLine("duplicate found");
// break; // <- uncomment this if you want one message only
}
else
hs.Add(item);

Edit: since hs.Add returns bool a shorter and more efficient code can be put:

HashSet<int> hs = new HashSet<int>();
int[] Arr = { 9, 5, 6, 3, 8, 2, 5, 1, 7, 4 };

foreach (item in Arr)
if (!hs.Add(item)) {
Console.WriteLine("duplicate found");
// break; // <- uncomment this if you want one message only
}

How to tell how many duplicates in a list?

>>> a = ['a', 'a', 'b']
>>> a.count('a')
2


Related Topics



Leave a reply



Submit