I have a module which is expensive to import (it involves downloading a ~20MB index file), which is used by a celery worker. Unfortunately I can't figure out how to have the module imported only once, and only by the celery worker.
Version 1 tasks.py file:
import expensive_module
@shared_task
def f():
expensive_module.do_stuff()
When I organize the file this way the expensive module is imported both by the web server and the celery instance, which is what I'd expect since the tasks module is imported in both and they're difference processes.
Version 2 tasks.py file:
@shared_task:
def f():
import expensive_module
expensive_module.do_stuff()
In this version the web server never imports the module (which is good), but the module gets re-imported by the celery worker every time f.delay() is called. This is what really confuses me. In this scenario, why is the module re-imported every time this function is run by the celery worker? How can I re-organize this code to have only the celery worker import the expensive module, and have the module imported only once?
As a follow-on, less important question, in Version 1 of the tasks.py file, why does the web instance import the expensive module twice? Both times it's imported form urls.py when django runs self._urlconf_module = import_module(self.urlconf_name).