O(1) Django ORM strategy to query related objects of related objects

Question

The relationship between Foo and Bar is through Baz as follows:

class Foo(Model):
   # stuff

class Bar(Model)
   # stuff

class Baz(Model):
   foos = ManyToManyField("Foo")
   bar = ForeignKey("Bar")

I basically need to generate the following dict representing the Bars that are related to each Foo through Baz (in dict comprehension pseudo-code):

{ foo.id: [list of unique bars related to the foo through any baz] for foo in all foos}

I can currently generate my data structure with O(N) queries (1 query per Foo), but with lots of data this is a bottleneck, and I need it optimized to O(1) (not a single query per se, but a fixed number of queries irrespective of data size of any of the models), while also minimizing iterations of the data in python.

If I knew that, I could translate it to Django. I am proficient with basic SQL but am, alas, not an OUTER JOIN ninja. :) — B Robster
– B Robster, Commented Jan 8, 2015 at 23:34

thebjorn · Accepted Answer · 2015-01-09 11:34:38Z

3

If you can drop to SQL, you could use the single query (the appname should prefix all the tables names):

select distinct foo.id, bar.id
from baz_foos
join baz on baz_foos.baz_id = baz.id
join foo on baz_foos.foo_id = foo.id
join bar on baz.bar_id = bar.id

baz_foos is the many-to-many table Django creates.

@Alasdair's solution is possibly/probably more readable (although if you're doing this for performance reasons that might not be most important). His solution uses exactly two queries (which is hardly a difference). The only problem I see is if you have a large number of Baz objects since the generated sql looks like this:

SELECT "foobar_baz"."id", "foobar_baz"."bar_id", "foobar_bar"."id" 
FROM "foobar_baz" 
INNER JOIN "foobar_bar" ON ("foobar_baz"."bar_id" = "foobar_bar"."id")

SELECT
    ("foobar_baz_foos"."baz_id") AS "_prefetch_related_val", 
    "foobar_foo"."id" 
FROM "foobar_foo" 
INNER JOIN "foobar_baz_foos" ON ("foobar_foo"."id" = "foobar_baz_foos"."foo_id") 
WHERE "foobar_baz_foos"."baz_id" IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 
    15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 
    35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 
    55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 
    75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 
    95, 96, 97, 98, 99, 100, 101)

If you have only a few Bar's and a few hundred Foo's, I would do:

from django.db import connection
from collections import defaultdict

# foos = {f.id: f for f in Foo.objects.all()}
bars = {b.id: b for b in Bar.objects.all()}

c = connection.cursor()
c.execute(sql)  # from above
d = defaultdict(set)
for f_id, b_id in c.fetchall():
    d[f_id].add(bars[b_id])

edited Jan 9, 2015 at 11:34

answered Jan 8, 2015 at 23:51

thebjorn

27.6k12 gold badges107 silver badges152 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

B Robster Over a year ago

Yeah, unfortunately of the data sets, baz is typically the largest and least limited (there are typically no more than a few bar's and maybe 30-100 foos) so I may go ahead and write the sql if i can figure out how to make it play nice with python.

thebjorn Over a year ago

I've updated the answer (the solution is basically doing a manual 'prefetch_related' on Bar).

B Robster Over a year ago

I do like the straightforwardness and cleanness of Alasdair's method, but this is great for situations where optimization is key and Baz has a big data set. Learned some good things here. Thanks, thebjorn.

sthzg · Accepted Answer · 2015-01-09 09:20:16Z

2

Using select_related and prefetch_related, I think you can build the required data structure with 2 queries:

out = {}
bazes = Baz.objects.select_related('bar').prefetch_related('foos')
for baz in bazes:
    for foo in baz.foos.all():
        out.setdefault(foo.id, set()).add(baz.bar)

The values of the output dictionary are sets, not lists as in your question, to ensure uniqueness.

edited Jan 9, 2015 at 9:20

sthzg

5,5642 gold badges32 silver badges51 bronze badges

answered Jan 8, 2015 at 23:53

Alasdair

310k59 gold badges605 silver badges534 bronze badges

5 Comments

thebjorn Over a year ago

Why not use a collections.defaultdict(set)?

Alasdair Over a year ago

There's a gotcha when using a default dict in a Django template. No other reason. Using a default dict would be fine, and you could convert to a regular dict afterwards if required.

sthzg Over a year ago

interesting bit from the 1.7 docs regarding large datasets: "prefetch_related in most cases will be implemented using an SQL query that uses the ‘IN’ operator. This means that for a large QuerySet a large ‘IN’ clause could be generated, which, depending on the database, might have performance problems of its own when it comes to parsing or executing the SQL query. Always profile for your use case!"

B Robster Over a year ago

Yeah I've noticed that prefetch_related basically uses an IN clause with a list of ID's. I'll profile it as I implement this and see what we're looking at.

B Robster Over a year ago

Because its so straightforward, I think this is the correct answer for everything except special situations (where optimization is paramount and baz has a huge data set) warranting the complexity of @thebjorn's response.

Collectives™ on Stack Overflow

O(1) Django ORM strategy to query related objects of related objects

2 Answers 2

3 Comments

5 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

5 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related