1

I am importing data using django-import-export but because I use ForeignKeyWidgets there are a lot of database calls making the import very slow for only a few 100 rows (checked with django-debug-toolbar).

On the documentation page of Bulk imports the following is mentioned:

"If you use ForeignKeyWidget then this can affect performance, because it reads from the database for each row. If this is an issue then create a subclass which caches get_queryset() results rather than reading for each invocation."

I believe caching the get_queryset() results could help me, but I have no idea how to do the caching. Could you help me with some example code?

I tried the following but still see the same amount of database calls:

class CachedForeignKeyWidget(ForeignKeyWidget):
    def __init__(self, model, field="pk", use_natural_foreign_keys=False, **kwargs):
        self.cached_queryset = model.objects.all()

        super().__init__(model, field, use_natural_foreign_keys, **kwargs)

    def get_queryset(self, value, row, *args, **kwargs):
        return self.cached_queryset
3
  • I don't have a code sample but it likely means to create a subclass widget, perform the get_queryset() call once and store internally in a dict, and then subsequent lookups can use the dict. Commented Apr 10, 2024 at 13:11
  • Hi Matthew, I've tried what you mentioned (see edited answer) but the amount of calls is still the same. They are triggered by line 449 in widgets.py (self.get_queryset(value, row, **kwargs).get(**{self.field: val})) Commented Apr 10, 2024 at 14:01
  • You would have to cache the results in the Widget - i.e. by storing in a map. Some more information on optimization here and here. Commented Apr 10, 2024 at 14:10

1 Answer 1

0

I have tested this using the bulk_import script.

My view is that the QuerySet cache is being used and although you see the SELECT SQL output for the relation table, the cache is being used, therefore the performance hit is minimal.

I can see in my logs that the lookup on the FK table takes negligible time, even when importing 100k rows.

I also see that the cache breakpoint is being hit during import.

If you are seeing a slow import, then there must be something else going on, so it would be useful to understand what the issue is.

Please do your own testing but my view is that the Django QuerySet cache is working correctly and there should be no need to implement your own caching. I'll update the documentation.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.