2

Which method of pymongo executes first? sort or limit?

I've a big collection where i need to filter for a specific discount category ("X"), e.g:

{"discount_value": 1, "name": "promotion_1", "category": ["X"]},
{"discount_value": 10, "name": "promotion_10", "category": ["X", "Y"]},
{"discount_value": 15, "name": "promotion_15", "category": ["X", "Y", "Z"]}

But the query must filter all documents by discount_values retrieving the bigger ones, lets say that i need to limit the query by 2:

.find({"category": {"$in": ["X"]}})
.limit(2)
.sort("discount_value", pymongo.DESCENDING)

This example will give me discount value 1 and 10 but what i needed was 10 and 15, how can i sort before limiting (if possible) without losing performance (the collection is really big)

1 Answer 1

2

sort() gets applied first and then limit.

Scroll down to this doc: https://docs.mongodb.com/manual/reference/method/db.collection.find/

The following statements chain cursor methods limit() and sort():

db.bios.find().sort( { name: 1 } ).limit( 5 )

db.bios.find().limit( 5 ).sort( { name: 1 } )

The two statements are equivalent; i.e. the order in which you chain the limit() and the sort() methods is not significant. Both statements return the first five documents, as determined by the ascending sort order on ‘name’.

Sort order testing

Sorting actually works correctly for me. Using MongoDB shell version v3.6.3 on Ubuntu 18.04 server, I loaded a file like this:

[{"discount_value": 1, "name": "promotion_1", "category": ["X"]},
{"discount_value": 10, "name": "promotion_10", "category": ["X", "Y"]},
{"discount_value": 15, "name": "promotion_15", "category": ["X", "Y", "Z"]}]

using mongoimport --db test --collection test1 --drop --file testing.txt --jsonArray

From mongo prompt, I tried to find and sort on discount_value descending and saw that 15 was on the top.

> db.test1.find({"category": {"$in": ["X"]}}).sort( {discount_value: -1} )
{ "_id" : ObjectId("5cb4beefea2d524413d8df57"), "discount_value" : 15, "name" : "promotion_15", "category" : [ "X", "Y", "Z" ] }
{ "_id" : ObjectId("5cb4beefea2d524413d8df56"), "discount_value" : 10, "name" : "promotion_10", "category" : [ "X", "Y" ] }
{ "_id" : ObjectId("5cb4beefea2d524413d8df55"), "discount_value" : 1, "name" : "promotion_1", "category" : [ "X" ] }

Limit testing

Notice how limit being before or after sort doesn't a difference in the output.

Limit after sort gives the same result as limit before sort.

> db.test1.find({"category": {"$in": ["X"]}}).sort( {discount_value: -1} ).limit(2)
{ "_id" : ObjectId("5cb4beefea2d524413d8df57"), "discount_value" : 15, "name" : "promotion_15", "category" : [ "X", "Y", "Z" ] }
{ "_id" : ObjectId("5cb4beefea2d524413d8df56"), "discount_value" : 10, "name" : "promotion_10", "category" : [ "X", "Y" ] }

vs.

> db.test1.find({"category": {"$in": ["X"]}}).limit(2).sort( {discount_value: -1} )
{ "_id" : ObjectId("5cb4beefea2d524413d8df57"), "discount_value" : 15, "name" : "promotion_15", "category" : [ "X", "Y", "Z" ] }
{ "_id" : ObjectId("5cb4beefea2d524413d8df56"), "discount_value" : 10, "name" : "promotion_10", "category" : [ "X", "Y" ] }

Sorting performance

The best I can tell you is to use indexes as suggested in their manual https://docs.mongodb.com/manual/tutorial/sort-results-with-indexes/ and use explain to understand where query bottlenecks exists for your kind of workload with https://docs.mongodb.com/manual/reference/method/db.collection.explain/#db.collection.explain

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.