How to Sort and Remove Duplicates from Python List

How to sort and remove duplicates from Python list?

A list can be sorted and deduplicated using built-in functions:

myList = sorted(set(myList))
  • set is a built-in function for Python >= 2.3
  • sorted is a built-in function for Python >= 2.4

Removing duplicates and sorting list python

You only need a slight modification. You are comparing strings instead of numbers, so try this instead:

def sorted_elements(numbers):
return sorted(set(numbers))

testcase = int(input())
while testcase > 0:
numbers = map(int, input().split())
l = sorted_elements(numbers)

for x in l:
print (x, end = ' ')

print ()
testcase -= 1

If you want, you can also do:

numbers = (int(x) for x in input().split())

Python: How to sort items and remove duplicates in one pass using custom comparator?

You can do this by sorting the list first, and then applying itertools.groupby. groupby groups the input iterable of rows according to the key, and outputs an iterable of tuples containing the group key value and the rows in that group.

Also, don't forget that itemgetter from the operators module is the perfect way of expressing both a sorting and a grouping key in most cases:

[k for k, _ in itertools.groupby(sorted(myList, key=itemgetter(1, 0)), key=itemgetter(0, 1))]

Notice that we sort first by the right hand value (index 1) in the tuple, but use the left hand value (index 0) as a tie-breaker. This is necessary, otherwise the final sort order might include something like ("a", 3), ("b", 3), ("a", 3) and that duplicate ("a", 3) would then not be removed.

If you only want to iterate through the result once then you can (and should) write this as a generator expression, which avoids the overhead of instantiating a second list:

g = (k for k, _ in itertools.groupby(sorted(myList, key=itemgetter(1, 0)), key=itemgetter(0, 1)))
for a, b in g:
print(a, b)

How do I remove duplicates from a list, while preserving order?

Here you have some alternatives: http://www.peterbe.com/plog/uniqifiers-benchmark

Fastest one:

def f7(seq):
seen = set()
seen_add = seen.add
return [x for x in seq if not (x in seen or seen_add(x))]

Why assign seen.add to seen_add instead of just calling seen.add? Python is a dynamic language, and resolving seen.add each iteration is more costly than resolving a local variable. seen.add could have changed between iterations, and the runtime isn't smart enough to rule that out. To play it safe, it has to check the object each time.

If you plan on using this function a lot on the same dataset, perhaps you would be better off with an ordered set: http://code.activestate.com/recipes/528878/

O(1) insertion, deletion and member-check per operation.

(Small additional note: seen.add() always returns None, so the or above is there only as a way to attempt a set update, and not as an integral part of the logical test.)

Best / most pythonic way to remove duplicate from the a list and sort in reverse order

sorted(set(orig_list), reverse=True)

Shortest in code, more efficient, same result.

Depending on the size, it may or may not be faster to sort first then dedupe in linear time as user2864740 suggests in comments. (The biggest drawback to that approach is it would be entirely in Python, while the above line executes mostly in native code.)

Your questions:

  • You do not need to convert from set to list and back. sorted accepts any iterable, so set qualifies, and spits out a list, so no post-conversion needed.

  • reversed(sorted(x)) is not equivalent to sorted(x, reverse=True). You get the same result, but slower - sort is of same speed whether forward or reverse, so reversed is adding an extra operation that is not needed if you sort to the proper ordering from the start.

Removing duplicates in lists

The common approach to get a unique collection of items is to use a set. Sets are unordered collections of distinct objects. To create a set from any iterable, you can simply pass it to the built-in set() function. If you later need a real list again, you can similarly pass the set to the list() function.

The following example should cover whatever you are trying to do:

>>> t = [1, 2, 3, 1, 2, 3, 5, 6, 7, 8]
>>> list(set(t))
[1, 2, 3, 5, 6, 7, 8]
>>> s = [1, 2, 3]
>>> list(set(t) - set(s))
[8, 5, 6, 7]

As you can see from the example result, the original order is not maintained. As mentioned above, sets themselves are unordered collections, so the order is lost. When converting a set back to a list, an arbitrary order is created.

Maintaining order

If order is important to you, then you will have to use a different mechanism. A very common solution for this is to rely on OrderedDict to keep the order of keys during insertion:

>>> from collections import OrderedDict
>>> list(OrderedDict.fromkeys(t))
[1, 2, 3, 5, 6, 7, 8]

Starting with Python 3.7, the built-in dictionary is guaranteed to maintain the insertion order as well, so you can also use that directly if you are on Python 3.7 or later (or CPython 3.6):

>>> list(dict.fromkeys(t))
[1, 2, 3, 5, 6, 7, 8]

Note that this may have some overhead of creating a dictionary first, and then creating a list from it. If you don’t actually need to preserve the order, you’re often better off using a set, especially because it gives you a lot more operations to work with. Check out this question for more details and alternative ways to preserve the order when removing duplicates.


Finally note that both the set as well as the OrderedDict/dict solutions require your items to be hashable. This usually means that they have to be immutable. If you have to deal with items that are not hashable (e.g. list objects), then you will have to use a slow approach in which you will basically have to compare every item with every other item in a nested loop.

Removing duplicates from a list and sorting it using python

I doubt your teacher meant to use strip() to eliminate duplicates, but to remove the whitespace after the name.
Since this looks like a homework problem, i won't give you the solution, but i'll try to point you in the right direction.

You should probably know how to read data, either with file = open("file") or with open("file") as f. So, with a list of names, we can get around to eliminating duplicates. However, the word may include some nasty characters at the end of each word(\n in particular for a newline). In order to get around this, call word.strip() which destroys the unnecessary characters and whitespace at the end. So, when you reach a list of words, execute something like

for i in names:
i = i.strip()

You are aware as you said of using sets, however, sets are unordered data types, so when you convert a list to a set(with set(list) and list(set)), and then the set back to a list, the order is lost. However, it is easily restored by a handy python function sorted(list), that will alphabetically sort the names for you.

It is then trivial to print the list, with something to the effect of

for i in names: #names is your list 
print(i)

EDIT: If you aren't familiar with sets, there are more understandable ways,
for example (this isn't very efficient):


  1. Keep an empty list of names to store names you have already seen (seen)
  2. Iterate through your list of names, and for each name

    1. If the name is in seen, list.pop(name) it from your list of names.
    2. If it is not, add it to seen with seen.append
  3. Print the list!

remove duplicates from 2d lists regardless of order

In [3]: b = []
In [4]: for aa in a:
...: if not any([set(aa) == set(bb) for bb in b if len(aa) == len(bb)]):
...: b.append(aa)
In [5]: b
Out[5]: [[1, 2], [1, 3], [2, 3]]

Python: Remove duplicate lists in list of lists

Try this :

list_ = [["A"], ["B"], ["A","B"], ["B","A"], ["A","B","C"], ["B", "A", "C"]]
l = list(map(list, set(map(tuple, map(set, list_)))))

Output :

[['A', 'B'], ['B'], ['A', 'B', 'C'], ['A']]

This process goes through like :

  1. First convert each sub-list into a set. Thus ['A', 'B'] and ['B', 'A'] both are converted to {'A', 'B'}.
  2. Now convert each of them to a tuple for removing duplicate items as set() operation can not be done with set sub-items in the list.
  3. With set() operation make a list of unique tuples.
  4. Now convert each tuple items in the list into list type.

This is equivalent to :

list_ = [['A'], ['B'], ['A', 'B'], ['B', 'A'], ['A', 'B', 'C'], ['B', 'A', 'C']]
l0 = [set(i) for i in list_]
# l0 = [{'A'}, {'B'}, {'A', 'B'}, {'A', 'B'}, {'A', 'B', 'C'}, {'A', 'B', 'C'}]
l1 = [tuple(i) for i in l0]
# l1 = [('A',), ('B',), ('A', 'B'), ('A', 'B'), ('A', 'B', 'C'), ('A', 'B', 'C')]
l2 = set(l1)
# l2 = {('A', 'B'), ('A',), ('B',), ('A', 'B', 'C')}
l = [list(i) for i in l2]
# l = [['A', 'B'], ['A'], ['B'], ['A', 'B', 'C']]


Related Topics



Leave a reply



Submit