Quantity of database query's vs application memory performance using Django

Question

If I need a total of all objects in a query set as well as a slice of filed values from those objects, which option would be better considering speed and application memory use (I am using a PostgreSQL backend):

Option a:

def get_data():
    queryset = MyObject.objects.all()
    total_objects = queryset.count()
    thumbs = queryset[:5].values_list('thumbnail', flat=True)
    return {total_objects:total_objects, thumbs:thumbs}

Option b:

def get_data():
    objects = list(MyObject.objects.all())
    total_objects = len(objects)
    thumbs = [o.thumbnail for o in objects[:5]]
    return {total_objects:total_objects, thumbs:thumbs}

If I understand things correctly, and certainly correct me if I am wrong:

Option a: It will hit the database two times and will result in only total_objects = integer and thumbs = list of strings in memory.

Option b: It will hit the database one time and will result in a list of all objects and all their filed data + option a items in memory.

Considering these options and that there are potentially millions of instances of MyObject: Is the speed of one data base hit (options a) preferable to the memory consumption of a single data base hit (option b)?

My priority is for overall speed in returning the data, but I am concerned about the larger memory consumption slowing things down even more than the extra database hit.

Measuring it on your actual system is the only way to be 100% sure, but my money's on the first approach. Transferring millions of rows out of the database just so you can count them and use the first five is a lot more work than just running two queries. — Peter DeGlopper
– Peter DeGlopper, Commented Feb 2, 2014 at 22:21
General advice on database queries are, if you can do something in your DBMS, then do it in your DBMS. Most of the time database operations are faster than doing the same thing in your code. — Mp0int
– Mp0int, Commented Feb 2, 2014 at 22:29

Peter Mortensen · Accepted Answer · 2016-05-28 16:44:11Z

1

Using SQL is the fastest method and will always beat the Python equivalent, even if it hits the database more. The difference is negligible in comparison. Remember, that's what SQL is meant to do - be fast and efficient.

Anyway, running a thousand loops using timeit, these are the results:

In [8]: %timeit get_data1() # Using ORM
1000 loops, best of 3: 628 µs per loop

In [9]: %timeit get_data2() # Using python
1000 loops, best of 3: 1.54 ms per loop

As you can see, the first method takes 628 microseconds per loop, while the second one takes 1.54 milliseconds. That's almost 2.5 times as much! A clear winner.

I used an SQLite database with only 100 objects in it (I used autofixture to spam the models). I'm guessing PostgreSQL will return different results, but I am still in favor of the first one.

edited May 28, 2016 at 16:44

Peter Mortensen

31.4k22 gold badges110 silver badges134 bronze badges

answered Feb 2, 2014 at 22:45

yuvi

18.5k9 gold badges63 silver badges98 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Quantity of database query's vs application memory performance using Django

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related