I have two models in Django, one for Songs, one for Albums, an Album has many Songs. I am trying to filter Albums where Songs are valid. For example, at least one Song has to have an audio file in order for the Album to be returned by the filter. I am using Postgres.
I am trying to figure out how to do this logic via a Django QuerySet but i am not certain how to use where exists instead of exists.
The following is the Django orm statement i am trying to get to work:
valid_songs = Song.objects.filter(
album=OuterRef('pk'),
audio_file__isnull=False).only("album")
Album.objects.annotate(
valid_song=Exists(valid_songs)).filter(
valid_song=True).query
This is the query that is generated:
SELECT "api_album"."id",
"api_album"."created_at",
"api_album"."updated_at",
"api_album"."title",
"api_album"."artwork_file_id",
"api_album"."user_id",
"api_album"."description",
"api_album"."tags",
"api_album"."genres",
EXISTS(SELECT U0."id",
U0."album_id"
FROM "api_song" U0
WHERE ( U0."album_id" = ( "api_album"."id" )
AND U0."audio_file_id" IS NOT NULL )) AS "valid_song"
FROM "api_album"
WHERE EXISTS(SELECT U0."id",
U0."album_id"
FROM "api_song" U0
WHERE ( U0."album_id" = ( "api_album"."id" )
AND U0."audio_file_id" IS NOT NULL )) = true
This is the postgres query plan for the above query generated by Django's QuerySet:
Seq Scan on api_album (cost=0.00..287.95 rows=60 width=641)
Filter: (alternatives: SubPlan 3 or hashed SubPlan 4)
SubPlan 3
-> Seq Scan on api_song u0_2 (cost=0.00..1.54 rows=1 width=0)
Filter: ((audio_file_id IS NOT NULL) AND (album_id = api_album.id))
SubPlan 4
-> Seq Scan on api_song u0_3 (cost=0.00..1.43 rows=10 width=4)
Filter: (audio_file_id IS NOT NULL)
SubPlan 1
-> Seq Scan on api_song u0 (cost=0.00..1.54 rows=1 width=0)
Filter: ((audio_file_id IS NOT NULL) AND (album_id = api_album.id))
SubPlan 2
-> Seq Scan on api_song u0_1 (cost=0.00..1.43 rows=10 width=4)
Filter: (audio_file_id IS NOT NULL)
(14 rows)
However, there is much more efficient query for this
SELECT *
FROM "api_album"
WHERE EXISTS(SELECT U0."id",
U0."album_id"
FROM "api_song" U0
WHERE ( U0."album_id" = ( "api_album"."id" )
AND U0."audio_file_id" IS NOT NULL ))
Hash Semi Join (cost=1.55..13.26 rows=10 width=640)
Hash Cond: (api_album.id = u0.album_id)
-> Seq Scan on api_album (cost=0.00..11.20 rows=120 width=640)
-> Hash (cost=1.43..1.43 rows=10 width=4)
-> Seq Scan on api_song u0 (cost=0.00..1.43 rows=10 width=4)
Filter: (audio_file_id IS NOT NULL)
(6 rows)
So my questions are as follows:
- What is the difference between where exists vs exists in this scenario and why aren't the same query plans created?
- How do I get the Django ORM to generate the more efficient query?
Edit: the django models are as follows:
class Album(BaseModel):
title = models.CharField(max_length=255, blank=False)
artwork_file = models.ForeignKey(
S3File, null=True, on_delete=models.CASCADE,
related_name="album_artwork_file")
user = models.ForeignKey(settings.AUTH_USER_MODEL,
related_name="albums",
on_delete=models.CASCADE)
description = models.TextField(blank=True)
tags = ArrayField(models.CharField(
max_length=16), default=default_arr)
genres = ArrayField(models.CharField(
max_length=16), default=default_arr)
class Song(BaseModel):
title = models.CharField(max_length=255, blank=False)
album = models.ForeignKey(Album,
related_name="songs",
on_delete=models.CASCADE)
audio_file = models.ForeignKey(
S3File, null=True, on_delete=models.CASCADE,
related_name="song_audio_file")
the following DOES not work because if you use a get() on this QuerySet it will throw an exception
Album.objects.filter(songs__audio_file__isnull=False).get(pk=1)
Album.MultipleObjectsReturned: get() returned more than one Album
The query set is being used with DjangoRest ModelViewSet, where the queryset is used for crud operations, and passed to the Album Serializer. This requires get() to work and return a single value.
class AlbumViewSet(viewsets.ModelViewSet):
serializer_class = AlbumSerializer
def get_queryset(self):
valid_songs = Song.objects.filter(
album=OuterRef('pk'),
audio_file__isnull=False).only('album')
# Slow query posted above
return Album.objects.annotate(
valid_song=Exists(valid_songs)
).filter(valid_song=True)