1

I'm using a small tree/graph package (django_dag) that has that give my model a Many-to-many children field which refers to itself. The basic structure can be shown as the following models

#models
class Foo(FooBase):
    class Meta:
        abstract = True

    children = models.ManyToManyField('self', symmetrical = False,
                                      through = Bar) 

class Bar():
    parent = models.ForeignKey(Foo)
    child = models.ForeignKey(Foo)

All is fine with the models and all the functionality of the package. FooBase adds a variety of functions to the model, including a way of recursively finding all children of a Foo and the children's children and so forth.

My concern is with the following function within FooBase:

def descendants_tree(self):
    tree = {}
    for f in self.children.all():
        tree[f] = f.descendants_tree()
    return tree

It outputs something like {Foo1:{}, Foo2: {Child_of_Foo2: {Child_of_Child_of_Foo2:{}}}} where the progeny are in a nested dictionary.

The alert reader may notice that this method calls a new query for each child. While these db hits are pretty quick, they can add up quickly when there might be 50+ children. And eventually, there will be tens of thousands of db entries. Right now, each query averages 0.6 msec with a row count of almost 2000.

Is there a more efficient way of doing this nested query?

In my mind, doing a select_related().all() beforehand would get it down to one query but that smells like trouble in the future. At what point is one large query better or worse than many small ones?

---Edit---

Here's what I'm trying to test the select_related().all() option with, but it's still hitting every iteration:

all_foo = Foo.objects.select_related('children').all()
def loop(baz):
    tree = {}
    for f in all_foo.get(id = baz).children.all()
        tree[f] = loop(f)
    return tree

I assume the children.all() is causing the hit. Is there another way to get all of Many-to-Many children without using the callable attribute?

1 Answer 1

1

You'll have to test under your own environment with your own circumstances. select_related is generally always recommended, but in cases where there will be many recursive levels, that one large query is generally slower than the multiple queries.

The amount of children doesn't really matter, the levels of recursion is what matters most. If you're doing 3 or so, select_related() might be better, but much more than that would likely result in a slow down. The plugin author likely did it this way to allow for many, many levels of recursion, because it only really hurts when there's just a few, and that's only a few extra queries.

Sign up to request clarification or add additional context in comments.

5 Comments

thanks for the info. Generally, there will only be a 1-2 level of recursion for most children, while some will have maybe 3-4. I don't know if that makes sense, but keeping with the family model: One 'parent' may have 30 children, 10 grandchildren, and 5 great-grandchildren. The data is wider than deep. Right now I'm trying to work on a way of using the all() query for comparison.
If that's the case, it may make more sense to use select_related() in your situation.
I'm just trying to tweak my loop, it's still doing a hit for each iteration rather than using the 'cached' select_related() query so I'm not doing something right
Ah, I'm assuming children can be NULL, right? select_related() doesn't follow relations with null=True by default. You need to use select_related('children') instead.
I still can't get it to use the cached query, it's still hitting every iteration. One day I'll figure it out, just keep getting back to it for a short amount of time when I can. In the meantime I'm still seeking any other ideas of methods to try.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.