Incorrect results with `annotate` + `values` + `union` in Django
After some debugging and going through the source code, I have an idea why this is happening. What I am going to do is try to explain that why doing annotate
+ values
results in displaying the id
and what is the difference between the two cases above.
To keep things simple, I will write also write the possible resulting sql query for each statement.
1. annotate
first but get values
on union query
qs1 = ModelA.objects.all().annotate(field1=F("field_1a"), field2=F("field_2a"))
When writing something like this, django will get all the fields + annotated fields, so the resulting sql query looks like:
select id, field_1a, field_2a, field_1a as field1, field_2a as field2 from ModelA
So, if we have a query
which is the result of:
qs = qs1.union(qs2)
the resulting sql for django looks like:
(select id, field_1a, field_2a, field_1a as field1, field_2a as field2 from ModelA)
UNION
(select id, field_1b, field_2b, field_1b as field1, field_2b as field2 from ModelB)
Let's go deeper into how this sql is generated. When we do a union
, a combinator
and combined_queries
is set on the qs.query
and the resulting sql is generated by combining the sql of individual queries. So, in summary:
qs.sql == qs1.sql UNION qs2.sql # in abstract sense
When, we do qs.values('field1', 'field2')
, the col_count
in compiler is set to 2 which is the number of fields. As you can see that the union query above returns 5 columns but in the final return from compiler each row in the results is sliced using col_count
. Now, this results
with only 2 columns is passed back to ValuesIterable
where it maps each name in the selected fields with the resulting columns. That is how it leads to the incorrect results.
2. annotate
+ values
on individual queries and then perform union
Now, let's see what happens when annotate
is used with values
directly
qs1 = ModelA.objects.all().annotate(field1=F("field_1a"), field2=F("field_2a")).values('field1', 'field2')
The resulting sql is:
select field_1a as field1, field_2a as field2 from ModelA
Now, when we do the union:
qs = qs1.union(qs2)
the sql is:
(select field_1a as field1, field_2a as field2 from ModelA)
UNION
(select field_1b as field1, field_2b as field2 from ModelB)
Now, when qs.values('field1', 'field2')
executes, the number of columns returned from union query has 2 columns which is same as the col_count
which is 2 and each field is matched with the individual columns producing the expected result.
3. Different field annotation count and ordering of fields
In the OP, there is a scenario when even using .values
before union
doesn't produce correct results. The reason for that is that in the ModelB
, there is no annotation for extra
field.
So, let's look at the queries generated for each model:
ModelA.objects.all()
.annotate(
field1=F("field_1a"), field2=F("field_2a"), extra=F("extra_a")
)
.values(*values)
The SQL becomes:
select field_1a as field1, field_2a as field2, extra_a as extra from ModelA
For ModelB:
ModelB.objects.all()
.annotate(field1=F("field_1b"), field2=F("field_2b"))
.values(*values)
SQL:
select extra, field_1b as field1, field_2b as field2 from ModelB
and the union is:
(select field_1a as field1, field_2a as field2, extra_a as extra from ModelA)
UNION
(select extra, field_1b as field1, field_2b as field2 from ModelB)
Because annotated fields are listed after the real db fields, the extra
of ModelB
is mixed with field1
of ModelB
. TO make sure that you get correct results, please make sure that the ordering of fields in generated SQL is always correct - with or without annotation. In this case, I will suggest to annotate extra
on ModelB
as well.
Django multiple annotate with Sum get wrong answer
You have stumbled upon the problem that Combining multiple aggregations [Django docs] will yield the wrong results because joins are used instead of subqueries.
Since to make aggregations over relations Django makes joins and with aggregation on multiple relations there would be multiple joins the results of course are wrong. Therefore we need to annotate using subqueries:
from django.db.models import OuterRef, Subquery
class DonationQuerySet(QuerySet):
...
def with_donations_stats(self):
# Subquery for wallet donation
# Added order_by because it appears you have some default ordering
wallet_donation = WalletTransaction.objects.filter(
condition=OuterRef('pk')
).order_by().values('condition').annotate(amount_sum=Sum('amount')).values('amount_sum')[:1]
# Subquery for normal donation
normal_donation = Transaction.objects.filter(
condition=OuterRef('pk')
).values('condition').annotate(amount_sum=Sum('amount')).values('amount_sum')[:1]
return self.annotate(
wallet_donation=Coalesce(Subquery(wallet_donation), 0),
normal_donation=Coalesce(Subquery(normal_donation), 0),
total_donation=F('wallet_donation') + F('normal_donation')
)
Django ORM annotate Sum calculation wrong and its multiplying by number of entry, it's wired
If you make multiple JOINs, teh they act as a multiplier, since you make a query
SELECT SUM(purchase.amount) - SUM(consumption.amount)
FROM person
LEFT OUTER JOIN purchase
LEFT OUTER JOIN consumption
so the same purchase.amount
is repeated that many times as there are related consumption
s, and the same consumption.amount
is repeated that many times as there are related purchase
s.
You can solve this with a subquery, for example with:
person_wise_payable = Person.objects.annotate(
difference=Coalesce(Subquery(
Purchase.objects.filter(
person=OuterRef('pk')
).values('person').annotate(
sum=Sum('amount')
).values('sum')[:1]
), Value(0)) - Coalesce(Subquery(
Consumption.objects.filter(
person=OuterRef('pk')
).values('person').annotate(
sum=Sum('amount')
).values('sum')[:1]
), Value(0))
)
Django, How to make multiple annotate in a single queryset
This is how django annotate produce sql code: it's do all necessary joins and then group by over all User fields, aggregating with annotation function(count in your case). So, it joins users with all their guide likes and then with all news likes and then simply counts number of rows produced per user.
If you can, you should use raw querysets, or extra
Queryset method. E.g:
User.objects.all().extra(select={
'guide_likes': 'select count(*) from tbl_guide_likes where user_id=tbl_users.id',
'news_like': 'select count(*) from tbl_news_likes where user_id=tbl_users.id'
}).\
values_list('first_name', 'last_name', 'guide_like','news_like')
For more flexibility you can use select_params
parameter of extra
method for providing names of tables(which you can get through Model._meta
). By the way this is very unconvenient and hackish method.
Sooner or later your logic become more complicated and then you should remove it from python code to sql(stored functions/procedures) and raw queries.
Related Topics
How to Select and Order by Columns Not in Groupy by SQL Statement - Oracle
Sqlserver - How to Find Dependent Tables on My Table
How to Multiply a Single Row with a Number from Column in Sql
Oracle SQL Syntax - Check Multiple Columns for Is Not Null
String Equivalent of Sum to Concatenate
How to Analyze 'Dbcc Memorystatus' Result in SQL Server 2008
Sql Distance Query Without Trigonometry
What Are Ways to Match Street Addresses in SQL Server
What's The Default Window Frame for Window Functions
Sql Design Approach for Searching a Table with an Unlimited Number of Bit Fields
Cannot Delete and Update Records on Access Linked Table
Difference Between <> and != in Sql
How to Get First and Last Day of Week in Oracle
Delphi: Accessing JSON Objects Within a JSON Array
Association Between Two Entries in SQL Table