0

I have a bunch of object in a response object, which I save into the database. Doing it object by object is very slow since that essentially means if its 30k objects, its gonna make 30k commits to the database afaik.

Example 1:

for obj in response['RESULTS']:

    _city = City.objects.create(
        id=obj['id'],
        name=obj['name'],
        shortname=obj['shortname'],
        location=obj['location'],
        region=region_fk
    )

    _events = Event.objects.get(pk=obj['Event'])
    _events.city_set.add(_city)

My new approach to implement bulk_create() is something like this:

Example 2:

bulk_list = []

for obj in response['RESULTS']:

    # get the foreignkey instead of duplicating data

    if obj.get('Region'):
        region_fk = Region.objects.get(pk=obj['Region'])      

    bulk_list.append(
        City(
            id=obj['id'],
            name=obj['name'],
            shortname=obj['shortname'],
            location=obj['location'],
            region=region_fk
        )
    )

bulk_save = City.objects.bulk_create(bulk_list)

While this is alot faster than my previous attempt it has a problem, now I dont know how to add my M2M relationships.

models.py

class City(models.Model):

    id = models.CharField(primary_key=True, max_length=64)
    name = models.CharField(max_length=32)
    shortname = models.CharField(max_length=32)
    location = models.CharField(max_length=32)
    region = models.ForeignKey(max_length=32)
    events = models.ManyToManyField(Event)


class Event(models.Model):

    id = models.CharField(primary_key=True, max_length=64)
    description = models.TextField()
    date = models.DateTimeField()

class Region(models.Model):

    id = models.IntegerField(primary_key=True)

Questions

I have looked around on stackoverflow and have found some examples but I dont understand them completely.. it seems like most answers talks about bulk_create the M2M relations aswell via a through model, and im not sure thats what I am looking for.

  1. How can I add those M2M relations?
  2. Please break it down so I can understand, I want to learn :-)

Any help or pointers are highly appreciated. Thank you.

Other information

I run:

  • postgresql
  • django==1.11

Related posts

Django docs on this topic

Response example:

"RESULT": [
  {
    "City": [
      {
        "id": "349bc6ab-1c82-46b9-889e-2cc534d5717e",
        "name": "Stockholm",
        "shortname": "Sthlm",
        "location": "Sweden",
        "region": [
          2
        ],
        "events": [
          {
            "id": "989b6563-97d2-4b7d-83a2-03c9cc774c21",
            "description": "some text",
            "date": "2017-06-19T00:00:00"
          },
          {
            "id": "70613514-e569-4af4-b770-a7bc9037ddc2",
            "description": "some text",
            "date": "2017-06-20T00:00:00"
          },
            {
            "id": "7533c16b-3b3a-4b81-9d1b-af528ec6e52b",
            "description": "some text",
            "date": "2017-06-22T00:00:00"
          },
      }
  }
]
2
  • Please add the related models to your question. Commented Jun 18, 2017 at 21:29
  • I have added models.py for completion. Thank you Commented Jun 18, 2017 at 21:39

1 Answer 1

0

It depends.

If you have M2M relationship without explicit through model, then possible solution with Django ORM would be:

from itertools import groupby

# Create all ``City`` objects (like you did in your second example):
cities = City.objects.bulk_create(
    [
        City(
            id=obj['id'],
            name=obj['name'],
            shortname=['shortname'],
            location=['location'],
            region=['region']
        ) for obj in response['RESULTS']
    ]
)

# Select all related ``Event`` objects.
events = Event.objects.in_bulk([obj['Event'] for obj in response['RESULTS']])

# Add all related cities to corresponding events:
for event_id, event_cities_raw in groupby(response['RESULTS'], lambda x: x['Event']):
    event = events[event_id]
    # To avoid DB queries we can gather all cities ids from response
    city_ids = [city['id'] for city in event_cities_raw]
    # And get saved objects from bulk_create result, which are required for ``add`` method.
    event_cities = [city for city in cities if city.pk in city_ids]
    event.city_set.add(*event_cities)

1 bulk_create query, 1 in_bulk query + 1 query for each unique Event in response (event.city_set.add does single UPDATE query by default).

With explicit through model it should be possible to use another bulk_create for this model, in other words replace all event.city_set.add queries with single ExplicitThrough.objects.bulk_create.

Possibly, you'll need to handle situation when Event from response['RESULTS'] doesn't exist, then you'll have to create these objects with another bulk_create.

Addressing your comment:

If there will be some events in response['RESULTS'] which don't exist in the DB. In this case you can do another bulk_create just under Event.objects.in_bulk query:

new_events = Event.objects.create_bulk([obj['Event'] for obj in response['RESULTS'] if obj['Event']['id'] not in events])

But here, it depends on the object structure in response['RESULTS']. But in general you need to create missing events here. It should be faster than using Event.objects.get_or_create calls.

Sign up to request clarification or add additional context in comments.

10 Comments

Thank you for a interesting and explanatory reply, this seems to work - but you are right about that I need to handle the situation when Event does not exist in the reply and in the database either. Since I am quite new to python and django, where would this check need to happen? edit: sent reply before I was done writing the first one, fixed
I updated the answer. Could you, please, specify how single element from response['RESULTS'] looks like?
Hello, I have updated with a response example in json in original post. In the code you provided you say event = events[event_id] and then event.city_set.add(*event_cities) - doesnt event need to be an instance of a event? like event = Event.objects.get(event_id=obj['id'])? and then event.city_set.add(*event_cities)? Sorry if im confusing you with my own confusion :)
[..] further thinking about this, what if the Event also had a foreignkey field to some other model? What would happen then? Currently I am a little mind boggled about this, it seems I need something that ties everything together to fully understand what is going on here.
in_bulk returns a dictionary like this {<id>: <Event instance>}, so this code should work :)
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.