Optimizing database queries in Django REST framework
Django REST Framework cannot automatically optimize queries for you, in the same way that Django itself won't. There are places you can look at for tips, including the Django documentation. It has been mentioned that Django REST Framework should automatically, though there are some challenges associated with that.
This question is very specific to your case, where you are using a custom SerializerMethodField
that makes a request for each object that is returned. Because you are making a new request (using the Friends.objects
manager), it is very difficult to optimize the query.
You can make the problem better though, by not creating a new queryset and instead getting the friend count from other places. This will require a backwards relation to be created on the Friendship
model, most likely through the related_name
parameter on the field, so you can prefetch all of the Friendship
objects. But this is only useful if you need the full objects, and not just a count of the objects.
This would result in a view and serializer similar to the following:
class Friendship(models.Model):
from_friend = models.ForeignKey(User, related_name="friends")
to_friend = models.ForeignKey(User)
class GetAllUsers(generics.ListAPIView):
...
def get_queryset(self):
return User.objects.all().prefetch_related("friends")
class GetAllUsersSerializer(serializers.ModelSerializer):
...
def get_is_friend_already(self, obj):
request = self.context.get('request', None)
friends = set(friend.from_friend_id for friend in obj.friends)
if request.user != obj and request.user.id in friends:
return True
else:
return False
If you just need a count of the objects (similar to using queryset.count()
or queryset.exists()
), you can include annotate the rows in the queryset with the counts of reverse relationships. This would be done in your get_queryset
method, by adding .annotate(friends_count=Count("friends"))
to the end (if the related_name
was friends
), which will set the friends_count
attribute on each object to the number of friends.
This would result in a view and serializer similar to the following:
class Friendship(models.Model):
from_friend = models.ForeignKey(User, related_name="friends")
to_friend = models.ForeignKey(User)
class GetAllUsers(generics.ListAPIView):
...
def get_queryset(self):
from django.db.models import Count
return User.objects.all().annotate(friends_count=Count("friends"))
class GetAllUsersSerializer(serializers.ModelSerializer):
...
def get_is_friend_already(self, obj):
request = self.context.get('request', None)
if request.user != obj and obj.friends_count > 0:
return True
else:
return False
Both of these solutions will avoid N+1 queries, but the one you pick depends on what you are trying to achieve.
Optimize Django Rest ORM queries
There's no silver bullet without looking at the code (and the profiling results) in detail.
The only thing that is a no-brainer is enforcing relationships in the models and in the database. This prevents a whole host of bugs, encourages the use of standardized, performant access (rather than concocting SQL on the spot which more often than not is likely to be buggy and slow) and makes your code both shorter and a lot more readable.
Other than that, 50-60 queries can be a lot (if you could do the same job with one or two) or it can be just right - it depends on what you achieve with them.
The use of prefetch_related
and select_related
is important, yes – but only if used correctly; otherwise it can slow you down instead of speeding you up.
Nested serializers are the correct approach if you need the data – but you need to set up your querysets properly in your viewset if you want them to be fast.
Time the main parts of slow views, inspect the SQL queries sent and check if you really need all data that is returned.
Then you can look at the sore spots and gain time where it matters. Asking specific questions on SO with complete code examples can also get you far fast.
If you have just one top-level object, you can refine the approach offered by @jensmtg, doing all the prefetches that you need at that level and then for the lower levels just using ModelSerializer
s (not SerializerMethodField
s) that access the prefetched objects. Look into the Prefetch object that allows nested prefetching.
But be aware that prefetch_related
is not for free, it involves quite some processing in Python; you may be better off using flat (db-view-like) joined queries with values()
and values_list
.
Related Topics
Generate a List of Datetimes Between an Interval
Subsampling Every Nth Entry in a Numpy Array
Using a Dictionary to Select Function to Execute
How to Tell Distutils to Use Gcc
Instance Attribute Attribute_Name Defined Outside _Init_
Sql-Like Window Functions in Pandas: Row Numbering in Python Pandas Dataframe
Find Nearest Indices for One Array Against All Values in Another Array - Python/Numpy
Cancel an Already Executing Task with Celery
Calculate Time Difference Between Pandas Dataframe Indices
Handling Multiple Requests in Flask
Overloaded Functions in Python
Unique Combinations of Values in Selected Columns in Pandas Data Frame and Count
How to Insert Data into a MySQL Database
Python Urllib2 with Keep Alive
Conversion of Strings Like \\Uxxxx in Python
How to Use a Conditional Expression (Expression with If and Else) in a List Comprehension