Django Annotate() Multiple Times Causes Wrong Answers

Incorrect results with `annotate` + `values` + `union` in Django

After some debugging and going through the source code, I have an idea why this is happening. What I am going to do is try to explain that why doing annotate + values results in displaying the id and what is the difference between the two cases above.

To keep things simple, I will write also write the possible resulting sql query for each statement.

1. annotate first but get values on union query

qs1 = ModelA.objects.all().annotate(field1=F("field_1a"), field2=F("field_2a"))

When writing something like this, django will get all the fields + annotated fields, so the resulting sql query looks like:

select id, field_1a, field_2a, field_1a as field1, field_2a as field2 from ModelA

So, if we have a query which is the result of:

qs = qs1.union(qs2)

the resulting sql for django looks like:

(select id, field_1a, field_2a, field_1a as field1, field_2a as field2 from ModelA)
UNION
(select id, field_1b, field_2b, field_1b as field1, field_2b as field2 from ModelB)

Let's go deeper into how this sql is generated. When we do a union, a combinator and combined_queries is set on the qs.query and the resulting sql is generated by combining the sql of individual queries. So, in summary:

qs.sql == qs1.sql UNION qs2.sql # in abstract sense

When, we do qs.values('field1', 'field2'), the col_count in compiler is set to 2 which is the number of fields. As you can see that the union query above returns 5 columns but in the final return from compiler each row in the results is sliced using col_count. Now, this results with only 2 columns is passed back to ValuesIterable where it maps each name in the selected fields with the resulting columns. That is how it leads to the incorrect results.

2. annotate + values on individual queries and then perform union

Now, let's see what happens when annotate is used with values directly

qs1 = ModelA.objects.all().annotate(field1=F("field_1a"), field2=F("field_2a")).values('field1', 'field2')

The resulting sql is:

select field_1a as field1, field_2a as field2 from ModelA

Now, when we do the union:

qs = qs1.union(qs2)

the sql is:

(select field_1a as field1, field_2a as field2 from ModelA)
UNION
(select field_1b as field1, field_2b as field2 from ModelB)

Now, when qs.values('field1', 'field2') executes, the number of columns returned from union query has 2 columns which is same as the col_count which is 2 and each field is matched with the individual columns producing the expected result.


3. Different field annotation count and ordering of fields

In the OP, there is a scenario when even using .values before union doesn't produce correct results. The reason for that is that in the ModelB, there is no annotation for extra field.

So, let's look at the queries generated for each model:

ModelA.objects.all()
.annotate(
field1=F("field_1a"), field2=F("field_2a"), extra=F("extra_a")
)
.values(*values)

The SQL becomes:

select field_1a as field1, field_2a as field2, extra_a as extra from ModelA

For ModelB:

ModelB.objects.all()
.annotate(field1=F("field_1b"), field2=F("field_2b"))
.values(*values)

SQL:

select extra, field_1b as field1, field_2b as field2 from ModelB

and the union is:

(select field_1a as field1, field_2a as field2, extra_a as extra from ModelA)
UNION
(select extra, field_1b as field1, field_2b as field2 from ModelB)

Because annotated fields are listed after the real db fields, the extra of ModelB is mixed with field1 of ModelB. TO make sure that you get correct results, please make sure that the ordering of fields in generated SQL is always correct - with or without annotation. In this case, I will suggest to annotate extra on ModelB as well.

Django multiple annotate with Sum get wrong answer

You have stumbled upon the problem that Combining multiple aggregations [Django docs] will yield the wrong results because joins are used instead of subqueries.

Since to make aggregations over relations Django makes joins and with aggregation on multiple relations there would be multiple joins the results of course are wrong. Therefore we need to annotate using subqueries:

from django.db.models import OuterRef, Subquery

class DonationQuerySet(QuerySet):
...

def with_donations_stats(self):
# Subquery for wallet donation
# Added order_by because it appears you have some default ordering
wallet_donation = WalletTransaction.objects.filter(
condition=OuterRef('pk')
).order_by().values('condition').annotate(amount_sum=Sum('amount')).values('amount_sum')[:1]

# Subquery for normal donation
normal_donation = Transaction.objects.filter(
condition=OuterRef('pk')
).values('condition').annotate(amount_sum=Sum('amount')).values('amount_sum')[:1]

return self.annotate(
wallet_donation=Coalesce(Subquery(wallet_donation), 0),
normal_donation=Coalesce(Subquery(normal_donation), 0),
total_donation=F('wallet_donation') + F('normal_donation')
)

Django ORM annotate Sum calculation wrong and its multiplying by number of entry, it's wired

If you make multiple JOINs, teh they act as a multiplier, since you make a query

SELECT SUM(purchase.amount) - SUM(consumption.amount)
FROM person
LEFT OUTER JOIN purchase
LEFT OUTER JOIN consumption

so the same purchase.amount is repeated that many times as there are related consumptions, and the same consumption.amount is repeated that many times as there are related purchases.

You can solve this with a subquery, for example with:

person_wise_payable = Person.objects.annotate(
difference=Coalesce(Subquery(
Purchase.objects.filter(
person=OuterRef('pk')
).values('person').annotate(
sum=Sum('amount')
).values('sum')[:1]
), Value(0)) - Coalesce(Subquery(
Consumption.objects.filter(
person=OuterRef('pk')
).values('person').annotate(
sum=Sum('amount')
).values('sum')[:1]
), Value(0))
)

Django, How to make multiple annotate in a single queryset

This is how django annotate produce sql code: it's do all necessary joins and then group by over all User fields, aggregating with annotation function(count in your case). So, it joins users with all their guide likes and then with all news likes and then simply counts number of rows produced per user.

If you can, you should use raw querysets, or extra Queryset method. E.g:

User.objects.all().extra(select={
'guide_likes': 'select count(*) from tbl_guide_likes where user_id=tbl_users.id',
'news_like': 'select count(*) from tbl_news_likes where user_id=tbl_users.id'
}).\
values_list('first_name', 'last_name', 'guide_like','news_like')

For more flexibility you can use select_params parameter of extra method for providing names of tables(which you can get through Model._meta). By the way this is very unconvenient and hackish method.
Sooner or later your logic become more complicated and then you should remove it from python code to sql(stored functions/procedures) and raw queries.



Related Topics



Leave a reply



Submit