Remove duplicates in a Django query
This query will not give you duplicates - ie, it will give you all the rows in the database, ordered by email.
However, I presume what you mean is that you have duplicate data within your database. Adding distinct()
here won't help, because even if you have only one field, you also have an automatic id
field - so the combination of id+email is not unique.
Assuming you only need one field, email_address
, de-duplicated, you can do this:
email_list = Email.objects.values_list('email', flat=True).distinct()
However, you should really fix the root problem, and remove the duplicate data from your database.
Example, deleting duplicate Emails by email field:
for email in Email.objects.values_list('email', flat=True).distinct():
Email.objects.filter(pk__in=Email.objects.filter(email=email).values_list('id', flat=True)[1:]).delete()
Or books by name:
for name in Book.objects.values_list('name', flat=True).distinct():
Book.objects.filter(pk__in=Artwork.objects.filter(name=name).values_list('id', flat=True)[3:]).delete()
Removing duplicate objects within a Django QuerySet
You could rather chain Q objects in your filter and rather produce single query to database
from functools import reduce
from operator import or_
words = ['software', 'engineer']
or_filter = reduce(or_, (Q(job_title__icontains=word) for word in words))
Vacancy.objects.filter(or_filter)
Or you could check if you already have that object in list prior to appending it in for instance by keeping set of object id's
How to Remove Duplicates Values after Merging Different Model Querysets in Django
You can use union()
; here is doc about union()
qs1.union(qs2)
# no duplicates
By default, union()
only gives you distinct values. If you want to allow duplicates, you use
qs1.union(qs2, all=True)
# allow duplicates
How to remove duplicate values from QuerySet?
You are looking for .distinct()
So your new query will look like -
users = User.objects.filter(is_active=True, article_creator__in=articles).distinct()
You might also want to check this answer out.
Delete Duplicate Rows in Django DB
The simplest way is the simplest way! Especially for one off scripts where performance doesn't even matter (unless it does). Since it's not core code, I'd just write the first thing that comes to mind and works.
# assuming which duplicate is removed doesn't matter...
for row in MyModel.objects.all().reverse():
if MyModel.objects.filter(photo_id=row.photo_id).count() > 1:
row.delete()
Use .reverse()
to delete the duplicates first and keep the first instance of it, rather than the last.
As always, back up before you do this stuff.
Related Topics
How to Drop All Foreign Key Constraints in All Tables
How to Keep the Order Using Select Where In()
Query Across Multiple Databases on Same Server
Prevent Insert If Condition Is Met
Combining Results of Two Select Statements
In SQL Is There a Difference Between Count(*) and Count(<Fieldname>)
Avoiding Concurrency Problems with Max+1 Integer in SQL Server 2008... Making Own Identity Value
Counting Number of Records Hour by Hour Between Two Dates in Oracle
SQL to Return First Two Columns of a Table
Sane/Fast Method to Pass Variable Parameter Lists to SQLserver2008 Stored Procedure
What Are Indexes and How to Use Them to Optimize Queries in My Database
How to Catch a Query Exception in Laravel to See If It Fails
How to Concatenate Strings in Entity Framework Query
How to Add Offset in a "Select" Query in Oracle 11G
How to Store Longitude & Latitude as a Geography in SQL Server 2014