1

Currently this code seems to be going to the database (Postgres) two to four times on every loop iteration. First to get (and create) the Type, then to get (and create) the Component. Is there a way to do this in fewer database trips?

models.py:

class Component(models.Model):
    long = models.TextField()
    type = models.SmallForeignKey('Type', models.CASCADE)


class Type(models.Model):
    type = models.TextField(unique=True)


class Point(models.Model):
    components = models.ArrayField(models.IntegerField(), default=[])

    def save_components(self, geocode):
        _components = []
        for c in geocode:
            ct = Type.objects.get_or_create(type=c['types'][0])
            _components.append(Component.objects.get_or_create(long=c['long_name'], type=ct).pk)
        self.components = _components
        self.save()

Incoming data:

geocode = [
    {
       "long_name" : "Luray",
       "types" : [ "locality", "political" ]
    },
    {
       "long_name" : "Page County",
       "types" : [ "administrative_area_level_2", "political" ]
    },
    {
       "long_name" : "Virginia",
       "types" : [ "administrative_area_level_1", "political" ]
    },
    {
       "long_name" : "United States",
       "types" : [ "country", "political" ]
    }
]
5
  • OK, ct and component don't come out of nowhere, they are fetched from database, how can you not hit the database and get the information? Commented Jan 9, 2017 at 17:26
  • @ShangWang Well I suppose it'd be possible to cache it locally, but that's not the question. The goal is to reduce the number of trips, possibly to 1 or 2. I suspect that would be possible given that all the information is already there at the time the first trip is made. Commented Jan 9, 2017 at 17:29
  • 1
    Why are you storing an array of primary keys, rather than using a ManyToManyField? Commented Jan 9, 2017 at 17:34
  • First, you have to query if that Type and Component exists, each costs one sql query, no doubt about that. If they don't exist, you have to create them, another 1 or 2 creation sql statements will be sent to the database. I'm not sure what do you mean all information is already there, don't you want to create them when if are not stored? Also as @DanielRoseman said: docs.djangoproject.com/en/1.10/topics/db/examples/many_to_many Commented Jan 9, 2017 at 17:35
  • @DanielRoseman I've been on the fence about whether I should use an array or a through table... I already have the code written for using M2M if I ever wanted to switch back (which is even more bloated), but arrays make far more sense for the queries I want to use. Can we kindly get back on topic now? Commented Jan 9, 2017 at 17:39

1 Answer 1

2

A lot of time, Django does a decent job caching database results. If you want to have more control, you could do something like this (provided that you do not have too many types)

class Point(models.Model):
    components = models.ArrayField(models.IntegerField(), default=[])

    def save_components(self, geocode):
        _components = []
        _types = {t.type: t for t in Type.objects.all()}
        for c in geocode:
            ct = _types.get(c['types'][0], None)
            if not ct:
                ct = Type.objects.create(type=c['types'][0])
            _components.append(Component.objects.get_or_create(long=c['long_name'], type=ct).pk)
        self.components = _components
        self.save()

This should save you looking up existing types all the time. You can also try to defer creating new Types and new Components (use get() instead of get_or_create() and catch the DoesNotExist exception) and use bulk insert later in the function (here is a doc link)

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.