Remove Duplicates in a Django Query

Remove duplicates in a Django query

This query will not give you duplicates - ie, it will give you all the rows in the database, ordered by email.

However, I presume what you mean is that you have duplicate data within your database. Adding distinct() here won't help, because even if you have only one field, you also have an automatic id field - so the combination of id+email is not unique.

Assuming you only need one field, email_address, de-duplicated, you can do this:

email_list = Email.objects.values_list('email', flat=True).distinct()

However, you should really fix the root problem, and remove the duplicate data from your database.

Example, deleting duplicate Emails by email field:

for email in Email.objects.values_list('email', flat=True).distinct():
Email.objects.filter(pk__in=Email.objects.filter(email=email).values_list('id', flat=True)[1:]).delete()

Or books by name:

for name in Book.objects.values_list('name', flat=True).distinct(): 
Book.objects.filter(pk__in=Artwork.objects.filter(name=name).values_list('id', flat=True)[3:]).delete()

Removing duplicate objects within a Django QuerySet

You could rather chain Q objects in your filter and rather produce single query to database

from functools import reduce
from operator import or_

words = ['software', 'engineer']
or_filter = reduce(or_, (Q(job_title__icontains=word) for word in words))

Vacancy.objects.filter(or_filter)

Or you could check if you already have that object in list prior to appending it in for instance by keeping set of object id's

How to Remove Duplicates Values after Merging Different Model Querysets in Django

You can use union(); here is doc about union()

qs1.union(qs2)
# no duplicates

By default, union()only gives you distinct values. If you want to allow duplicates, you use

qs1.union(qs2, all=True)
# allow duplicates

How to remove duplicate values from QuerySet?

You are looking for .distinct()

So your new query will look like -

users = User.objects.filter(is_active=True, article_creator__in=articles).distinct()

You might also want to check this answer out.

Delete Duplicate Rows in Django DB

The simplest way is the simplest way! Especially for one off scripts where performance doesn't even matter (unless it does). Since it's not core code, I'd just write the first thing that comes to mind and works.

# assuming which duplicate is removed doesn't matter...
for row in MyModel.objects.all().reverse():
if MyModel.objects.filter(photo_id=row.photo_id).count() > 1:
row.delete()

Use .reverse() to delete the duplicates first and keep the first instance of it, rather than the last.

As always, back up before you do this stuff.



Related Topics



Leave a reply



Submit