4

If A contains a set of B and B contains a set of C, then I am looking for a way to start with A and end up with a query set of C.

A simple example:

class Book(models.Model):
    name = models.CharField(max_length=64)


class Page(models.Model):
    number = models.IntegerField()
    book = models.ForeignKey(Book)


class Paragraph(models.Model):
    number = models.IntegerField()
    page = models.ForeignKey(Page)


def query():
    books = Book.objects.all()\
       .prefetch_related('page_set', 'page_set__paragraph_set')

    for book in books:
        pages = book.page_set

        # I need to do something like this
        paragraphs = pages.all().paragraph_set
        # invalid

        # or
        paragraphs = book.page_set.select_related('paragraph_set')
        # valid, but paragraphs is still a QuerySet of Pages

        # this works, but results in one query for EVERY book,
        # which is what I need to avoid
        paragraphs = Paragraph.objects.filter(page__book=book)


        # do stuff with the book
        #...


        # do stuff with the paragraphs in the book
        # ...

How do I get a query set of paragraphs from just a Book instance?

The named args syntax for Django queries supports an infinite level of nesting of sets/foreign key relationships, but I can't find a way to use the ORM mapping to actually get a related query set from the top-down.

And getting query-set from the bottom-up negates the benefits of prefetch_related/select_related.

The above example is a simplified version of what I need to do in my application. The database has thousands of "Books" and any n + 1 queries must be avoided.

I found a question about using prefetch across multiple levels, but the answer didn't address how to actually obtain the fetched query set for use.

1
  • Just to be clear, you don't want the bottom-up approach like this: Paragraph.objects.filter(page__book=the_book_instance) ? Commented Oct 30, 2018 at 23:13

2 Answers 2

1

After you've done the prefetch, it seems like the only cheap way to access the child records is through all(). Any filters seem to trigger another database query.

The short answer to your question about all the paragraphs in a book is to use a list comprehension with two levels:

    paragraphs = [paragraph
                  for page in book.page_set.all()
                  for paragraph in page.paragraph_set.all()]

Here's a runnable example:

# Tested with Django 1.11.13
from __future__ import print_function
import os
import sys

import django
from django.apps import apps
from django.apps.config import AppConfig
from django.conf import settings
from django.core.files.base import ContentFile, File
from django.db import connections, models, DEFAULT_DB_ALIAS
from django.db.models.base import ModelBase

from django_mock_queries.mocks import MockSet, mocked_relations

NAME = 'udjango'


def main():
    setup()

    class Book(models.Model):
        name = models.CharField(max_length=64)

    class Page(models.Model):
        number = models.IntegerField()
        book = models.ForeignKey(Book)

    class Paragraph(models.Model):
        number = models.IntegerField()
        page = models.ForeignKey(Page)

    syncdb(Book)
    syncdb(Page)
    syncdb(Paragraph)

    b = Book.objects.create(name='Gone With The Wind')
    p = b.page_set.create(number=1)
    p.paragraph_set.create(number=1)
    b = Book.objects.create(name='The Three Body Problem')
    p = b.page_set.create(number=1)
    p.paragraph_set.create(number=1)
    p.paragraph_set.create(number=2)
    p = b.page_set.create(number=2)
    p.paragraph_set.create(number=1)
    p.paragraph_set.create(number=2)

    books = Book.objects.all().prefetch_related('page_set',
                                                'page_set__paragraph_set')

    for book in books:
        print(book.name)
        paragraphs = [paragraph
                      for page in book.page_set.all()
                      for paragraph in page.paragraph_set.all()]
        for paragraph in paragraphs:
            print(paragraph.page.number, paragraph.number)


def setup():
    DB_FILE = NAME + '.db'
    with open(DB_FILE, 'w'):
        pass  # wipe the database
    settings.configure(
        DEBUG=True,
        DATABASES={
            DEFAULT_DB_ALIAS: {
                'ENGINE': 'django.db.backends.sqlite3',
                'NAME': DB_FILE}},
        LOGGING={'version': 1,
                 'disable_existing_loggers': False,
                 'formatters': {
                    'debug': {
                        'format': '%(asctime)s[%(levelname)s]'
                                  '%(name)s.%(funcName)s(): %(message)s',
                        'datefmt': '%Y-%m-%d %H:%M:%S'}},
                 'handlers': {
                    'console': {
                        'level': 'DEBUG',
                        'class': 'logging.StreamHandler',
                        'formatter': 'debug'}},
                 'root': {
                    'handlers': ['console'],
                    'level': 'WARN'},
                 'loggers': {
                    "django.db": {"level": "DEBUG"}}})
    app_config = AppConfig(NAME, sys.modules['__main__'])
    apps.populate([app_config])
    django.setup()
    original_new_func = ModelBase.__new__

    @staticmethod
    def patched_new(cls, name, bases, attrs):
        if 'Meta' not in attrs:
            class Meta:
                app_label = NAME
            attrs['Meta'] = Meta
        return original_new_func(cls, name, bases, attrs)
    ModelBase.__new__ = patched_new


def syncdb(model):
    """ Standard syncdb expects models to be in reliable locations.

    Based on https://github.com/django/django/blob/1.9.3
    /django/core/management/commands/migrate.py#L285
    """
    connection = connections[DEFAULT_DB_ALIAS]
    with connection.schema_editor() as editor:
        editor.create_model(model)

main()

And here's the end of the output. You can see that it only runs one query for each table.

2018-10-30 15:58:25[DEBUG]django.db.backends.execute(): (0.000) SELECT "udjango_book"."id", "udjango_book"."name" FROM "udjango_book"; args=()
2018-10-30 15:58:25[DEBUG]django.db.backends.execute(): (0.000) SELECT "udjango_page"."id", "udjango_page"."number", "udjango_page"."book_id" FROM "udjango_page" WHERE "udjango_page"."book_id" IN (1, 2); args=(1, 2)
2018-10-30 15:58:25[DEBUG]django.db.backends.execute(): (0.000) SELECT "udjango_paragraph"."id", "udjango_paragraph"."number", "udjango_paragraph"."page_id" FROM "udjango_paragraph" WHERE "udjango_paragraph"."page_id" IN (1, 2, 3); args=(1, 2, 3)
Gone With The Wind
1 1
The Three Body Problem
1 1
1 2
2 1
2 2
Sign up to request clarification or add additional context in comments.

2 Comments

thanks. I had cobbled together something similar, but the double for list comprehension is much cleaner than what I had. The only issue with this and @SamMason answer is that the result is a list and not a QuerySet. In my specific use, I'm passing this result to another method that expects a QuerySet parameter with methods like .count(), and .filter(). But after doing some research it looks like there is no way to append to a QuerySet, and combining QuerySets via | or union() calls the internal _clone() method, which does not copy the cache. So future operations re-run the query.
You can use duck typing, @BrianHVB. Write a class that holds the list of paragraph objects, and has .count() and .filter() methods that operate on that list. I've used the django-mock-queries library for running Django tests without a database, but you could use a simpler version to stick to your cached data. However, it's probably simpler to modify the method that is expecting a QuerySet so that it will accept a list.
1

further to Don's answer, you can use Prefetch objects to apply any filters you want, e.g:

from django.db import models, connection

def query():
    paragraph_filter = models.Prefetch(
        'page_set__paragraph_set',
        Paragraph.objects.filter(number__gt=1))

    books = Book.objects.all().prefetch_related(
        'page_set', paragraph_filter)

    for book in books:
        for page in book.page_set.all():
            for paragraph in page.paragraph_set.all():
                print(paragraph)

    print(connection.queries)

Django takes care of ensuring that all appropriate objects gets loaded in a small number of queries (one per table, so you'll get three queries)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.