Fastest Way to Remove Duplicate Value from a List<> by Lambda

Fastest way to Remove Duplicate Value from a list by lambda

The easiest way to get a new list would be:

List<long> unique = longs.Distinct().ToList();

Is that good enough for you, or do you need to mutate the existing list? The latter is significantly more long-winded.

Note that Distinct() isn't guaranteed to preserve the original order, but in the current implementation it will - and that's the most natural implementation. See my Edulinq blog post about Distinct() for more information.

If you don't need it to be a List<long>, you could just keep it as:

IEnumerable<long> unique = longs.Distinct();

At this point it will go through the de-duping each time you iterate over unique though. Whether that's good or not will depend on your requirements.

Remove duplicates from a ListT in C#

Perhaps you should consider using a HashSet.

From the MSDN link:

using System;
using System.Collections.Generic;

class Program
{
    static void Main()
    {
        HashSet<int> evenNumbers = new HashSet<int>();
        HashSet<int> oddNumbers = new HashSet<int>();

        for (int i = 0; i < 5; i++)
        {
            // Populate numbers with just even numbers.
            evenNumbers.Add(i * 2);

            // Populate oddNumbers with just odd numbers.
            oddNumbers.Add((i * 2) + 1);
        }

        Console.Write("evenNumbers contains {0} elements: ", evenNumbers.Count);
        DisplaySet(evenNumbers);

        Console.Write("oddNumbers contains {0} elements: ", oddNumbers.Count);
        DisplaySet(oddNumbers);

        // Create a new HashSet populated with even numbers.
        HashSet<int> numbers = new HashSet<int>(evenNumbers);
        Console.WriteLine("numbers UnionWith oddNumbers...");
        numbers.UnionWith(oddNumbers);

        Console.Write("numbers contains {0} elements: ", numbers.Count);
        DisplaySet(numbers);
    }

    private static void DisplaySet(HashSet<int> set)
    {
        Console.Write("{");
        foreach (int i in set)
        {
            Console.Write(" {0}", i);
        }
        Console.WriteLine(" }");
    }
}

/* This example produces output similar to the following:
 * evenNumbers contains 5 elements: { 0 2 4 6 8 }
 * oddNumbers contains 5 elements: { 1 3 5 7 9 }
 * numbers UnionWith oddNumbers...
 * numbers contains 10 elements: { 0 2 4 6 8 1 3 5 7 9 }
 */

Remove duplicates from a list of objects based on property in Java 8

You can get a stream from the List and put in in the TreeSet from which you provide a custom comparator that compares id uniquely.

Then if you really need a list you can put then back this collection into an ArrayList.

import static java.util.Comparator.comparingInt;
import static java.util.stream.Collectors.collectingAndThen;
import static java.util.stream.Collectors.toCollection;

...
List<Employee> unique = employee.stream()
                                .collect(collectingAndThen(toCollection(() -> new TreeSet<>(comparingInt(Employee::getId))),
                                                           ArrayList::new));

Given the example:

List<Employee> employee = Arrays.asList(new Employee(1, "John"), new Employee(1, "Bob"), new Employee(2, "Alice"));

It will output:

[Employee{id=1, name='John'}, Employee{id=2, name='Alice'}]

Another idea could be to use a wrapper that wraps an employee and have the equals and hashcode method based with its id:

class WrapperEmployee {
    private Employee e;

    public WrapperEmployee(Employee e) {
        this.e = e;
    }

    public Employee unwrap() {
        return this.e;
    }

    @Override
    public boolean equals(Object o) {
        if (this == o) return true;
        if (o == null || getClass() != o.getClass()) return false;
        WrapperEmployee that = (WrapperEmployee) o;
        return Objects.equals(e.getId(), that.e.getId());
    }

    @Override
    public int hashCode() {
        return Objects.hash(e.getId());
    }
}

Then you wrap each instance, call distinct(), unwrap them and collect the result in a list.

List<Employee> unique = employee.stream()
                                .map(WrapperEmployee::new)
                                .distinct()
                                .map(WrapperEmployee::unwrap)
                                .collect(Collectors.toList());

In fact, I think you can make this wrapper generic by providing a function that will do the comparison:

public class Wrapper<T, U> {
    private T t;
    private Function<T, U> equalityFunction;

    public Wrapper(T t, Function<T, U> equalityFunction) {
        this.t = t;
        this.equalityFunction = equalityFunction;
    }

    public T unwrap() {
        return this.t;
    }

    @Override
    public boolean equals(Object o) {
        if (this == o) return true;
        if (o == null || getClass() != o.getClass()) return false;
        @SuppressWarnings("unchecked")
        Wrapper<T, U> that = (Wrapper<T, U>) o;
        return Objects.equals(equalityFunction.apply(this.t), that.equalityFunction.apply(that.t));
    }

    @Override
    public int hashCode() {
        return Objects.hash(equalityFunction.apply(this.t));
    }
}

and the mapping will be:

.map(e -> new Wrapper<>(e, Employee::getId))

Remove objects with a duplicate property from List

If you want to avoid using a third-party library, you could do something like:

var bar = fooArray.GroupBy(x => x.Id).Select(x => x.First()).ToList();

That will group the array by the Id property, then select the first entry in the grouping.

Remove duplicates in the list using linq

var distinctItems = items.Distinct();

To match on only some of the properties, create a custom equality comparer, e.g.:

class DistinctItemComparer : IEqualityComparer<Item> {

    public bool Equals(Item x, Item y) {
        return x.Id == y.Id &&
            x.Name == y.Name &&
            x.Code == y.Code &&
            x.Price == y.Price;
    }

    public int GetHashCode(Item obj) {
        return obj.Id.GetHashCode() ^
            obj.Name.GetHashCode() ^
            obj.Code.GetHashCode() ^
            obj.Price.GetHashCode();
    }
}

Then use it like this:

var distinctItems = items.Distinct(new DistinctItemComparer());

Removing duplicates in lists

The common approach to get a unique collection of items is to use a set. Sets are unordered collections of distinct objects. To create a set from any iterable, you can simply pass it to the built-in set() function. If you later need a real list again, you can similarly pass the set to the list() function.

The following example should cover whatever you are trying to do:

>>> t = [1, 2, 3, 1, 2, 3, 5, 6, 7, 8]
>>> list(set(t))
[1, 2, 3, 5, 6, 7, 8]
>>> s = [1, 2, 3]
>>> list(set(t) - set(s))
[8, 5, 6, 7]

As you can see from the example result, the original order is not maintained. As mentioned above, sets themselves are unordered collections, so the order is lost. When converting a set back to a list, an arbitrary order is created.

Maintaining order

If order is important to you, then you will have to use a different mechanism. A very common solution for this is to rely on OrderedDict to keep the order of keys during insertion:

>>> from collections import OrderedDict
>>> list(OrderedDict.fromkeys(t))
[1, 2, 3, 5, 6, 7, 8]

Starting with Python 3.7, the built-in dictionary is guaranteed to maintain the insertion order as well, so you can also use that directly if you are on Python 3.7 or later (or CPython 3.6):

>>> list(dict.fromkeys(t))
[1, 2, 3, 5, 6, 7, 8]

Note that this may have some overhead of creating a dictionary first, and then creating a list from it. If you don’t actually need to preserve the order, you’re often better off using a set, especially because it gives you a lot more operations to work with. Check out this question for more details and alternative ways to preserve the order when removing duplicates.

Finally note that both the set as well as the OrderedDict/dict solutions require your items to be hashable. This usually means that they have to be immutable. If you have to deal with items that are not hashable (e.g. list objects), then you will have to use a slow approach in which you will basically have to compare every item with every other item in a nested loop.

how to remove empty strings from list, then remove duplicate values from a list

dtList = dtList.Where(s => !string.IsNullOrWhiteSpace(s)).Distinct().ToList();

I assumed empty string and whitespace are like null. If not you can use IsNullOrEmpty (allow whitespace), or s != null

Fastest Way to Remove Duplicate Value from a List<> by Lambda