0

Given a simple set of models as follows:

class A(models.Model):
    pass

class B(models.Model):
    parent = models.ForeignKey(A, related_name='b_set')

class C(models.Model):
    parent = models.ForeignKey(B, related_name='c_set')

I am looking to create a query set of the A model with two annotations. One annotation should be the number of B rows that have the A row in question as their parent. The other annotation should denote the number of B rows, again with the A object in question as parent, which have at least n objects of type C in their c_set.

As an example, consider the following database and n = 3:

Table A
id
0
1

Table B
id  parent
0   0
1   0

Table C
id parent
0   0
1   0
2   1
3   1
4   1

I'd like to be able to get a result of the form [(0, 2, 1), (1, 0, 0)] as the A object with id 0 has two B objects of which one has at least three related C objects. The A object with id 1 has no B objects and therefore also no B objects with at least three C rows.

The first annotation is trivial:

A.objects.annotate(annotation_1=Count('b_set'))

What I am trying to design now is the second annotation. I have managed to count the number of B rows per A where the B object has at least a single C object as follows:

A.objects.annotate(annotation_2=Count('b_set__c_set__parent', distinct=True))

But I cannot figure out a way to do it with a minimum related set size other than one. Hopefully someone here can point me in the right direction. One method I was thinking of was somehow annotating the B objects in the query instead of the A rows as is the default of the annotate method but I could not find any resources on this.

1 Answer 1

1

This is a complicated query at limits of Django 1.11. I decided to do it by two queries and to combine results to one list that can be used by a view like a queryset:

from django.db.models import Count

sub_qs = (
    C.objects
    .values('parent')
    .annotate(c_count=Count('id'))
    .order_by()
    .filter(c_count__gte=n)
    .values('parent')
)
qs = B.objects.filter(id__in=sub_qs).values('parent_id').annotate(cnt=Count('id'))
qs_map = {x['parent_id']: x['cnt'] for x in qs}
rows = list(A.objects.annotate(annotation_1=Count('b_set')))
for row in rows:
    row.annotation_2 = qs_map.get(row.id, 0)

The list rows is the result. The more complicated qs.query is compiled to a relative simple SQL:

>>> print(str(qs.query))
SELECT app_b.parent_id, COUNT(app_b.id) AS cnt
FROM app_b
WHERE app_b.id IN (
    SELECT U0.parent_id AS Col1 FROM app_c U0
    GROUP BY U0.parent_id HAVING COUNT(U0.id) >= 3
)
GROUP BY app_b.parent_id;                -- (added white space and removed double quotes)

This simple solution can be easier modified and tested.


Note: A solution by one query also exists, but doesn't seem useful. Why: It would require Subquery and OuterRef(). They are great, however in general Count() from aggregation is not supported by queries that are compiled together with join resolution. A subquery can be separated by lookup ...__in=... to can be compiled by Django, but then it is not possible to use OuterRef(). If it is written without OuterRef() then it is a so complicated not optimal nested SQL that the time complexity would be probably O(n2) by size of A table for many (or all) database backends. Not tested.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.