How to Remove Duplicates from a List, While Preserving Order

How do I remove duplicates from a list, while preserving order?

Here you have some alternatives: http://www.peterbe.com/plog/uniqifiers-benchmark

Fastest one:

def f7(seq):
seen = set()
seen_add = seen.add
return [x for x in seq if not (x in seen or seen_add(x))]

Why assign seen.add to seen_add instead of just calling seen.add? Python is a dynamic language, and resolving seen.add each iteration is more costly than resolving a local variable. seen.add could have changed between iterations, and the runtime isn't smart enough to rule that out. To play it safe, it has to check the object each time.

If you plan on using this function a lot on the same dataset, perhaps you would be better off with an ordered set: http://code.activestate.com/recipes/528878/

O(1) insertion, deletion and member-check per operation.

(Small additional note: seen.add() always returns None, so the or above is there only as a way to attempt a set update, and not as an integral part of the logical test.)

One-liner to remove duplicates, keep ordering of list

You could use an OrderedDict, but I suggest sticking with your for-loop.

>>> from collections import OrderedDict
>>> data = ['Herb', 'Alec', 'Herb', 'Don']
>>> list(OrderedDict.fromkeys(data))
['Herb', 'Alec', 'Don']

Just to reiterate: I seriously suggest sticking with your for-loop approach, and use a set to keep track of already seen items:

>>> data = ['Herb', 'Alec', 'Herb', 'Don']
>>> seen = set()
>>> unique_data = []
>>> for x in data:
... if x not in seen:
... unique_data.append(x)
... seen.add(x)
...
>>> unique_data
['Herb', 'Alec', 'Don']

And in case you just want to be wacky (seriously don't do this):

>>> [t[0] for t in sorted(dict(zip(reversed(data), range(len(data), -1, -1))).items(), key=lambda t:t[1])]
['Herb', 'Alec', 'Don']

How to remove duplicates from a list whilst preserving order?

from itertools import groupby

lst = ['a','a','b','a', 'a', 'c', 'c', 'c','d','e','a', 'b', 'b', 'b']

print([key for key, _ in groupby(lst)])

Remove duplicates in a list while keeping its order (Python)

My answer to your other question, which you completely ignored!, shows you're wrong in claiming that

The answers of that question did not
keep the "order"

  • my answer did keep order, and it clearly said it did. Here it is again, with added emphasis to see if you can just keep ignoring it...:

Probably the fastest approach, for a really big list, if you want to preserve the exact order of the items that remain, is the following...:

biglist = [ 
{'title':'U2 Band','link':'u2.com'},
{'title':'ABC Station','link':'abc.com'},
{'title':'Live Concert by U2','link':'u2.com'}
]

known_links = set()
newlist = []

for d in biglist:
link = d['link']
if link in known_links: continue
newlist.append(d)
known_links.add(link)

biglist[:] = newlist

Need to remove duplicates from a nested list preserving the order

This works similar to removing duplicates from a regular list whilst preserving order, but we need to do something about the sublists not being hashable and the order of the sublists being irrelevant.

We can use frozen sets to solve both these issues at once.

>>> lst = [[1,2],[1,3],[1,4],[2,1],[2,5],[3,1],[3,2]] 
>>> seen = set()
>>> result = []
>>>
>>> for x in lst:
... s = frozenset(x)
... if s not in seen:
... result.append(x)
... seen.add(s)
...
>>> result
[[1, 2], [1, 3], [1, 4], [2, 5], [3, 2]]

Is there a way to remove duplicates in a list while keeping original order?

def remove_duplicates(lst):
i = 0
while i < len(lst):
j = i + 1
while j < len(lst): # check for duplicates of lst[i] and remove them
if lst[i] == lst[j]:
del lst[j]
else:
j += 1 # only increment second idx if the item is not removed!
i += 1
return

And testing it:

>>> lst = [1, 3, 0, 1, 4, 1, 1, 2, 2, 5, 4, 3, 1, 3, 3, 4, 2, 4, 3, 1, 3, 0, 3, 0, 0]
>>> remove_duplicates(lst)
>>> lst
[1, 3, 0, 4, 2, 5]

You could also implement it with a set instead of the second while loop (which is definetly faster but I'm not sure if allowed!):

def remove_duplicates(lst):
i = 0
found = set()
while i < len(lst):
if lst[i] in found:
del lst[i]
else:
found.add(lst[i])
i += 1
return

Just a quick note about why you're approach couldn't work can be found in the documentation of list.remove:

list.remove(x)

Remove the first item from the list whose value is x. It is an error if there is no such item.

But you want to remove all occurences except the first one!

Removing elements from a list preserving order and one copy of duplicates

First of all, do not alter l1 as you iterate over it: that will throw off your iteration indexing and give undesirable results.

Looking at the logic another way, l3 is composed of

  • l1 elements that do not appear in l2
  • l1 elements that do appear in l2, but are in l1 more than once

You can attack this one of two ways: (1) iterate over l1 and check these conditions for each element; (2) iterate over l2, identifying elements to remove; then build l3 from l1, removing elements and reducing remaining duplicates as needed.

You can use the count method to determine whether an item appears more than once, as in

if l1.count(item) > 1:
l3.append(item)

Detailed design and coding are left as an exercise for the student. :-)



Related Topics



Leave a reply



Submit