3

I have the following models:

class LocationPoint(models.Model):
    latitude = models.DecimalField(max_digits=16, decimal_places=12)
    longitude = models.DecimalField(max_digits=16, decimal_places=12)

    class Meta:
        unique_together = (
            ('latitude', 'longitude',),
        )
class GeoLogEntry(models.Model):
    device = models.ForeignKey(Device, on_delete=models.PROTECT)
    location_point = models.ForeignKey(LocationPoint, on_delete=models.PROTECT)
    recorded_at = models.DateTimeField(db_index=True)
    created_at = models.DateTimeField(auto_now_add=True, db_index=True)

I have lots of incoming records to create (probably thousands at once).

Currently I create them like this:

# Simplified map function contents (removed mapping from dict as it's unrelated to the question topic
points_models = map(lambda point: LocationPoint(latitude=latitude, longitude=longitude), points)

LocationPoint.objects.bulk_create(
     points_models,
     ignore_conflicts=True
)

# Simplified map function contents (removed mapping from dict as it's unrelated to the question topic
geo_log_entries = map(
            lambda log_entry: GeoLogEntry(device=device, location_point=LocationPoint.objects.get(latitude=latitude, longitude=longitude), recorded_at=log_entry.recorded_at),
            log_entries
        )

GeoLogEntry.objects.bulk_create(geo_log_entries, ignore_conflicts=True)

But I think it's not very effective because it runs N SELECT queries for N records. Is there a better way to do that?

I use Python 3.9, Django 3.1.2 and PostgreSQL 12.4.

2
  • I assume it is lambda point: LocationPoint(latitude=point.latitude, ...)), so point.latitde instead of latitutude? Commented Oct 10, 2020 at 20:25
  • I can also recommend one dirty solution. Execute that part asynchronously using something like celery if you don't actually need to return created objects as response Commented Oct 10, 2020 at 20:28

2 Answers 2

2

The main problem is to fetch the objects to link to in bulk to. We can fetch the objects in bulk once we stored all of these objects:

from django.db.models import Q

points_models = [
    LocationPoint(latitude=point.latitude, longitude=point.longitude)
    for point in points
]

LocationPoint.objects.bulk_create(
     points_models,
     ignore_conflicts=True
)

qfilter = Q(
    *[
          Q(('latitude', point.latitude), ('longitude', point.longitude))
          for point in log_entries
    ],
    _connector=Q.OR
)


data = {
    (lp.longitude, lp.latitude): lp.pk
    for lp in LocationPoint.objects.filter(qfilter)
}

geo_log_entries = [
    GeoLogEntry(
        device=entry.device,
        location_point_id=data[entry.longitude, entry.latitude],
        recorded_at=entry.recorded_at
    )
    for entry in log_entries
]

GeoLogEntry.objects.bulk_create(geo_log_entries, ignore_conflicts=True)

We thus fetch all the objects in bulk that we need to link to (with one query thus), make a dictionary that maps the longitude and latitude on the primary key, and then set location_point_id to that point.

It is however important that one uses decimals, or at least a type that will match. Floating points are tricky, since these can easily have rounding errors (therefore often longitudes and latitudes are stored as "fixed point" numbers, so for example integers that are a factor 1'000 larger or 1'000'000 larger). Otherwise you should use an algorithm that matches it with the data that is generated through querying.

Sign up to request clarification or add additional context in comments.

3 Comments

Thanks! That's much faster for queries (86 ms vs 1100 ms) on 1000 records, but still slow on Python side (4.4 s vs 16 s). Any tips on how to optimize it?
@artem: for a dictionary this should not be that slow. For linear search that is of course a different story. You can perhaps try to profile where exactly the performance gap is located.
Seems that it's just debug template rendering time (+ debug toolbar costs), and creating data time cost grows fast with grow of record count (10k takes 20 seconds to create data on python side), but I think I'll optimize it. Thanks again :)
0

bulk_create(...) will return you created objects as a list. You can filter those objects on Python side, instead of making queries to your DB, as they are already fetched.

location_points = LocationPoint.objects.bulk_create(
     points_models,
     ignore_conflicts=True
)

geo_log_entries = map(
    lambda log_entry: GeoLogEntry(
        device=device, 
        location_point=get_location_point(log_entry, location_points),      
        recorded_at=log_entry.recorded_at
    ),
    log_entries
)

GeoLogEntry.objects.bulk_create(geo_log_entries, ignore_conflicts=True)

All you need to do is implement get_location_point satisfying your needs

8 Comments

The problem is that it will for most databases not fill in the primary key in the objects, so that means that one can not use these to assign a value to location_point.
Yep, PK is null for the created objects.
@WillemVanOnsem AFAIK using Postgres 12 along with Django 3 is sufficient for primary keys to be set. OP specified that he is using Django 3.1.2 and PostgreSQL 12.4
@artem hmm, OK then. I just Though that I did that once...
Django docs says that primary keys must be set though
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.