Using: Django 1.11, Postgres 9.6
I need to optimise a Django ORM query for consumption by the Django Rest Framework. The query must return records that:
- Match a list of IDs in their
targetfield - Limit the result set to records where a
sourceID appears more than once in the set.
The current approach is to create a subquery using annotate and Count but the sheer amount of processing behind each request added to pagination means that the apps that it's causing timeouts or very slow behaviour.
If there is anything that can be done by Postgres on the server as a raw query I'm fine with that.
Model:
class Relationship(models.Model):
id = models.AutoField(primary_key=True)
source = models.BigIntegerField(db_index=True)
target = models.BigIntegerField(db_index=True)
View snippet:
match_list = [123, 456, 789] # dummy data for example
queryset = Relationship.objects.filter(target__in=match_list)
sub_queryset = (Relationship.objects.filter(target__in=_match_list)
.values('source')
.annotate(source_count=Count("source"))
.filter(source_count__gt=1)
)
sub_ids = [i["source"] for i in sub_queryset]
queryset = (queryset.filter(source__in=sub_ids)
)
The API takes a list of target IDs as an argument and responds with a list of all source IDs that are connected to that target. However, I'm filtering the queryset to only return source records that are connected to two or more targets.
As background, the resulting queryset will be served by Django Rest Framework and it's currently causing timeouts because the requests get exponentially longer the more
Note: I'm putting this on SO because it's causing my requests to timeout and therefore causing a fault. I know I could extend the timeout duration but would rather optimise the query. I considered CodeReview but felt this was more appropriate.
Edit 1: Following @albar's suggestion, it's currently a separate subquery as the annotate / Count operation only works if the values are returned, not full records
annotate/Countaggregation step.Relationship.objects.filter(target__in=_match_list).annotate(source_count=Count("source")).filter(source_count__gt=1).source_countvalue never gets above 1. I had to do the .valuesoperation to enable the aggregation across the whole result set. I think the answer might lie in doing a subquery rather than multiple queries but I've not got much experience in that yet.