0

If I need a total of all objects in a query set as well as a slice of filed values from those objects, which option would be better considering speed and application memory use (I am using a PostgreSQL backend):

Option a:

def get_data():
    queryset = MyObject.objects.all()
    total_objects = queryset.count()
    thumbs = queryset[:5].values_list('thumbnail', flat=True)
    return {total_objects:total_objects, thumbs:thumbs}

Option b:

def get_data():
    objects = list(MyObject.objects.all())
    total_objects = len(objects)
    thumbs = [o.thumbnail for o in objects[:5]]
    return {total_objects:total_objects, thumbs:thumbs}

If I understand things correctly, and certainly correct me if I am wrong:

Option a: It will hit the database two times and will result in only total_objects = integer and thumbs = list of strings in memory.

Option b: It will hit the database one time and will result in a list of all objects and all their filed data + option a items in memory.

Considering these options and that there are potentially millions of instances of MyObject: Is the speed of one data base hit (options a) preferable to the memory consumption of a single data base hit (option b)?

My priority is for overall speed in returning the data, but I am concerned about the larger memory consumption slowing things down even more than the extra database hit.

2
  • 2
    Measuring it on your actual system is the only way to be 100% sure, but my money's on the first approach. Transferring millions of rows out of the database just so you can count them and use the first five is a lot more work than just running two queries. Commented Feb 2, 2014 at 22:21
  • General advice on database queries are, if you can do something in your DBMS, then do it in your DBMS. Most of the time database operations are faster than doing the same thing in your code. Commented Feb 2, 2014 at 22:29

1 Answer 1

1

Using SQL is the fastest method and will always beat the Python equivalent, even if it hits the database more. The difference is negligible in comparison. Remember, that's what SQL is meant to do - be fast and efficient.

Anyway, running a thousand loops using timeit, these are the results:

In [8]: %timeit get_data1() # Using ORM
1000 loops, best of 3: 628 µs per loop

In [9]: %timeit get_data2() # Using python
1000 loops, best of 3: 1.54 ms per loop

As you can see, the first method takes 628 microseconds per loop, while the second one takes 1.54 milliseconds. That's almost 2.5 times as much! A clear winner.

I used an SQLite database with only 100 objects in it (I used autofixture to spam the models). I'm guessing PostgreSQL will return different results, but I am still in favor of the first one.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.