Identify Duplicates in a List

How do I find the duplicates in a list and create another list with them?

To remove duplicates use set(a). To print duplicates, something like:

a = [1,2,3,2,1,5,6,5,5,5]

import collections
print([item for item, count in collections.Counter(a).items() if count > 1])

## [1, 2, 5]

Note that Counter is not particularly efficient (timings) and probably overkill here. set will perform better. This code computes a list of unique elements in the source order:

seen = set()
uniq = []
for x in a:
if x not in seen:
uniq.append(x)
seen.add(x)

or, more concisely:

seen = set()
uniq = [x for x in a if x not in seen and not seen.add(x)]

I don't recommend the latter style, because it is not obvious what not seen.add(x) is doing (the set add() method always returns None, hence the need for not).

To compute the list of duplicated elements without libraries:

seen = set()
dupes = []

for x in a:
if x in seen:
dupes.append(x)
else:
seen.add(x)

or, more concisely:

seen = set()
dupes = [x for x in a if x in seen or seen.add(x)]

If list elements are not hashable, you cannot use sets/dicts and have to resort to a quadratic time solution (compare each with each). For example:

a = [[1], [2], [3], [1], [5], [3]]

no_dupes = [x for n, x in enumerate(a) if x not in a[:n]]
print no_dupes # [[1], [2], [3], [5]]

dupes = [x for n, x in enumerate(a) if x in a[:n]]
print dupes # [[1], [3]]

Identify duplicates in a List

The method add of Set returns a boolean whether a value already exists (true if it does not exist, false if it already exists, see Set documentation).

So just iterate through all the values:

public Set<Integer> findDuplicates(List<Integer> listContainingDuplicates) { 
final Set<Integer> setToReturn = new HashSet<>();
final Set<Integer> set1 = new HashSet<>();

for (Integer yourInt : listContainingDuplicates) {
if (!set1.add(yourInt)) {
setToReturn.add(yourInt);
}
}
return setToReturn;
}

Identify duplicate values in a list in Python

These answers are O(n), so a little more code than using mylist.count() but much more efficient as mylist gets longer

If you just want to know the duplicates, use collections.Counter

from collections import Counter
mylist = [20, 30, 25, 20]
[k for k,v in Counter(mylist).items() if v>1]

If you need to know the indices,

from collections import defaultdict
D = defaultdict(list)
for i,item in enumerate(mylist):
D[item].append(i)
D = {k:v for k,v in D.items() if len(v)>1}

Identifying duplicates in a list of character vectors in R

A binary output can be generated with

any(duplicated(unlist(my_list)))
[1] TRUE

As pointed out correctly in comments by @sindri_baldur, if duplicates appear in groups they should be handled with unique, if desired:

any(duplicated(unlist(lapply(my_list, unique))))
[1] TRUE

or another base R alternative

anyDuplicated(unlist(lapply(my_list, unique))) > 1
[1] TRUE

How to print only the duplicate elements in python list

There is the Counter class from collections that does the trick

from collections import Counter

lst = [4,3,2,4,5,6,4,7,6,8]
d = Counter(lst) # -> Counter({4: 3, 6: 2, 3: 1, 2: 1, 5: 1, 7: 1, 8: 1})
res = [k for k, v in d.items() if v > 1]
print(res)
# [4, 6]

How to find duplicates in a list of list of objects in java

Actually, you have:

List<List<String>> compositeKeyValues;

Lists are equal if they have the same elements in the same order - like your example.

Finding duplicate inner Lists is no different finding duplicates of other simpler types.

Here's one way:

List<List<String>> duplicates = compositeKeyValues.stream()
.collect(Collectors.groupingBy(Function.identity(), Collectors.counting()))
.entrySet().stream()
.filter(e -> e.getValue().intValue() > 1)
.map(Map.Entry::getKey)
.collect(Collectors.toList());

This code will work even if you leave the type of the List as List<Object>, except the result would also have type List<Object>. However, it's recommended, and more useful, to use a more specific type List<List<String>>.

R - Finding duplicates in list entries

You can unlist first:

unlisted <- unlist(examplelist)
unlisted[duplicated(unlisted)]
# b1 c1 c2
# "red" "black" "green"

unlisted[!duplicated(unlisted)]
# a1 a2 a3 b2 b3 c3
# "blue" "red" "yellow" "black" "green" "brown"

If you only want the vector (without the names), use unname:

unlisted <- unname(unlist(examplelist))

How to identify duplicate records in a list?

You are looking for both registered and unregistered businesses. This is where instead of making use of 0 and 1, you could choose to implement the attribute as a boolean isRegistered such as 0 is false and 1 is true going forward. Your existing code with if-else could be re-written as :

Map<Boolean, List<MyVo>> partitionBasedOnRegistered = dataList.stream()
.collect(Collectors.partitioningBy(MyVo::isRegistered));
List<MyVo> unregisteredBusinesses = partitionBasedOnRegistered.get(Boolean.FALSE); // here
List<MyVo> registeredBusinesses = partitionBasedOnRegistered.get(Boolean.TRUE);

How to find duplicates in a list?

Try this:

val dup = List(1,1,1,2,3,4,5,5,6,100,101,101,102)
dup.groupBy(identity).collect { case (x, List(_,_,_*)) => x }

The groupBy associates each distinct integer with a list of its occurrences. The collect is basically map where non-matching elements are ignored. The match pattern following case will match integers x that are associated with a list that fits the pattern List(_,_,_*), a list with at least two elements, each represented by an underscore since we don't actually need to store those values (and those two elements can be followed by zero or more elements: _*).

You could also do:

dup.groupBy(identity).collect { case (x,ys) if ys.lengthCompare(1) > 0 => x }

It's much faster than the approach you provided since it doesn't have to repeatedly pass over the data.



Related Topics



Leave a reply



Submit