Annotating Django query sets through reverse foreign keys

Question

Given a simple set of models as follows:

class A(models.Model):
    pass

class B(models.Model):
    parent = models.ForeignKey(A, related_name='b_set')

class C(models.Model):
    parent = models.ForeignKey(B, related_name='c_set')

I am looking to create a query set of the A model with two annotations. One annotation should be the number of B rows that have the A row in question as their parent. The other annotation should denote the number of B rows, again with the A object in question as parent, which have at least n objects of type C in their c_set.

As an example, consider the following database and n = 3:

Table A
id
0
1

Table B
id  parent
0   0
1   0

Table C
id parent
0   0
1   0
2   1
3   1
4   1

I'd like to be able to get a result of the form [(0, 2, 1), (1, 0, 0)] as the A object with id 0 has two B objects of which one has at least three related C objects. The A object with id 1 has no B objects and therefore also no B objects with at least three C rows.

The first annotation is trivial:

A.objects.annotate(annotation_1=Count('b_set'))

What I am trying to design now is the second annotation. I have managed to count the number of B rows per A where the B object has at least a single C object as follows:

A.objects.annotate(annotation_2=Count('b_set__c_set__parent', distinct=True))

But I cannot figure out a way to do it with a minimum related set size other than one. Hopefully someone here can point me in the right direction. One method I was thinking of was somehow annotating the B objects in the query instead of the A rows as is the default of the annotate method but I could not find any resources on this.

Daniel Holmes · Accepted Answer · 2018-07-09 10:36:51Z

This is a complicated query at limits of Django 1.11. I decided to do it by two queries and to combine results to one list that can be used by a view like a queryset:

from django.db.models import Count

sub_qs = (
    C.objects
    .values('parent')
    .annotate(c_count=Count('id'))
    .order_by()
    .filter(c_count__gte=n)
    .values('parent')
)
qs = B.objects.filter(id__in=sub_qs).values('parent_id').annotate(cnt=Count('id'))
qs_map = {x['parent_id']: x['cnt'] for x in qs}
rows = list(A.objects.annotate(annotation_1=Count('b_set')))
for row in rows:
    row.annotation_2 = qs_map.get(row.id, 0)

The list rows is the result. The more complicated qs.query is compiled to a relative simple SQL:

>>> print(str(qs.query))
SELECT app_b.parent_id, COUNT(app_b.id) AS cnt
FROM app_b
WHERE app_b.id IN (
    SELECT U0.parent_id AS Col1 FROM app_c U0
    GROUP BY U0.parent_id HAVING COUNT(U0.id) >= 3
)
GROUP BY app_b.parent_id;                -- (added white space and removed double quotes)

This simple solution can be easier modified and tested.

Note: A solution by one query also exists, but doesn't seem useful. Why: It would require Subquery and OuterRef(). They are great, however in general Count() from aggregation is not supported by queries that are compiled together with join resolution. A subquery can be separated by lookup ...__in=... to can be compiled by Django, but then it is not possible to use OuterRef(). If it is written without OuterRef() then it is a so complicated not optimal nested SQL that the time complexity would be probably O(n²) by size of A table for many (or all) database backends. Not tested.

Collectives™ on Stack Overflow

Annotating Django query sets through reverse foreign keys

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related