9

I have a django app running in production. Its database has main write instance and a few read replicas. I use DATABASE_ROUTERS to route between the write instance and the read replicas based on whether I need to read or write.

I encountered a situation where I have to do some async processing on an object due to a user request. The order of actions is:

  1. User submits a request via HTTPS/REST.
  2. The view creates an Object and saves it to the DB.
  3. Trigger a celery job to process the object outside of the request-response cycle and passing the object ID to it.
  4. Sending an OK response to the request.

Now, the celery job may kick in in 10 ms or 10 minutes depending on the queue. When it finally tuns, the celery job first tries to load the object based on the ID provided. Initially I had issues doing a my_obj = MyModel.objects.get(pk=given_id) because the read replica would be used at this point, if the queue is empty and the celery job runs immediately after being triggered, the object may have not propagated to the read-replicas yet.

I resolved that issue by replacing my_obj = MyModel.objects.get(pk=given_id) with my_obj = MyModel.objects.using('default').get(pk=given_id) -- this ensures the object is read from my write-db-instance and is always available.

however, now I have another issue I did not anticipate.

calling my_obj.certain_many_to_many_objects.all() triggers another call to the database as the ORM is lazy. That call IS being done on the read-replica. I was hoping it would stick to the database I defined with using but that's not the case. Is there a way to force all sub-element objects to use the same write-db-instance?

1
  • Doesn't using my_obj.certain_many_to_many_objects.all().using('default') work? Commented Sep 7, 2021 at 23:30

4 Answers 4

4
+500

I suspect your custom database router needs a tweak. The default behaviour without a custom router should provide the database stickiness you require

The default routing scheme ensures that objects remain ‘sticky’ to their original database (i.e., an object retrieved from the foo database will be saved on the same database). [...] You don’t have to do anything to activate the default routing scheme – it is provided ‘out of the box’ on every Django project.

From Automatic DB Routing

So your DB router just needs to offer this behaviour upfront, as probably being the Right Thing To Do in 99.9% of cases.

def db_for_read(model, **hints):
    instance = hints.get('instance')
    if instance is not None and instance._state.db:
        return instance._state.db
    # else return your read replica
    return 'read-only'  # or whatever it's called

See django/db/utils.py

Sign up to request clarification or add additional context in comments.

Comments

1

Doesn't using my_obj.certain_many_to_many_objects.all().using('default') work?

the .all() returns a queryset, so you should be able to add the .using(..) part for it with it working.

Comments

1

Model managers and the QuerySet API reference can be used to change the database replica There is a way to specify which DB connection to use with Django. For each model manager, Django's BaseManager class uses a private property self._db to hold the DB connection, you may specify another value as well.

class MyModelRelationQuerySet(models.QuerySet):
    def filter_on_my_obj(self, given_id):
        # preform the base query set you want
        return self.filter(relation__fk=given_id)


class MyModelManager(models.Manager):

    # bypass self._db on BaseManager class
    def get_queryset(self):
        
        # proper way to pass "using" would be using=self._db
        # for your case you may pass your 'master db connection'
        return MyModelRelationQuerySet (self.model, using=your_write_replica)

    def my_obj_filter(self, given_id):
        return self.get_queryset().get_my_obj(given_id)

    
# pass the model manager to model 
class MyModel(models.Model):
     # ...
     objects = MyModelManager()

documents on making custom QuerySet for model managers in Django.

and reading Django's models.Manger source code and the QuerySet source code can be insightful for such advanced issues with querying the data bese.

Comments

0

The first step to using more than one database with Django is to tell Django about the database servers you’ll be using. This is done using the DATABASES setting. This setting maps database aliases, which are a way to refer to a specific database throughout Django, to a dictionary of settings for that specific connection.

An exception to this rule is the makemigrations command. It validates the migration history in the databases to catch problems with the existing migration files (which could be caused by editing them) before creating new migrations. By default, it checks only the default database, but it consults the allow_migrate() method of routers if any are installed.

route_app_labels = {'auth', 'contenttypes'}

    def db_for_read(self, model, **hints):
        """
        Attempts to read auth and contenttypes models go to auth_db.
        """
        if model._meta.app_label in self.route_app_labels:
            return 'auth_db'
        return None

db_for_read(model, **hints) Suggest the database that should be used for read operations for objects of type model.

If a database operation is able to provide any additional information that might assist in selecting a database, it will be provided in the hints dictionary. Details on valid hints are provided below.

Returns None if there is no suggestion.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.