How can I use indexes with sort on a MongoDb $or query?

Question

I've got a collection mailCollection with indexes on the .sender and .recipient values. When I do an $or query and sort by .timestamp the entire collection is scanned. How can I index this collection or rewrite the query to grab the documents where .sender or .recipient match a specific value, sorted and limited?

mailCollection indexes:

{recipient: 1}
{sender: 1}

Slow code:

email = <some email address I want to query>;
cursor = mailCollection.find({$or: [{sender: email}, {recipient: email}]});
cursor.sort({timestamp: 1}).limit(100).toArray(function(error, result) {
  //yikes, full collection scan
});

If it makes a difference, I'm using the MongoDb node.js driver.

Have you tried actually adding timestamp as an index? Have you tried adding timestamp as a secondary field in a compound index to either of the existing indexes? Perhaps read Sort and Non-prefix Subset of an Index add indexes and run with explain(). Use hint() if needed and then remove indexes that would not be picked up. No the driver does not make a difference. — Neil Lunn
– Neil Lunn, Commented Oct 28, 2017 at 4:03

Markus W Mahlberg · Accepted Answer · 2017-10-28 05:19:21Z

Dodgy. A 3-way index intersection does not work. So you need to intersect with a compound index. However, you need to make sure that you have a proper prefix in your compound index so that you have a proper use for it and make it efficient.

The data

> db.indextest.find()
{ "_id" : ObjectId("59f401893e9fcadcbf2b1694"), "sender" : "[email protected]", "recipient" : "[email protected]", "timestamp" : ISODate("2017-10-28T04:03:21.468Z") }
{ "_id" : ObjectId("59f405d93e9fcadcbf2b1695"), "sender" : "[email protected]", "recipient" : "[email protected]", "timestamp" : ISODate("2017-10-28T04:21:45.573Z") }
{ "_id" : ObjectId("59f408413e9fcadcbf2b1699"), "sender" : "[email protected]", "recipient" : "[email protected]", "timestamp" : ISODate("2017-10-28T04:32:01.651Z") }
{ "_id" : ObjectId("59f408563e9fcadcbf2b169a"), "sender" : "[email protected]", "recipient" : "[email protected]", "timestamp" : ISODate("2017-10-28T04:32:22.376Z") }
{ "_id" : ObjectId("59f408763e9fcadcbf2b169b"), "sender" : "[email protected]", "recipient" : "[email protected]", "timestamp" : ISODate("2017-10-28T04:32:54.268Z") }
{ "_id" : ObjectId("59f4087e3e9fcadcbf2b169c"), "sender" : "[email protected]", "recipient" : "[email protected]", "timestamp" : ISODate("2017-10-28T04:33:02.615Z") }

The indices

I decided to create an index both on sender and recipient with an additional key on timestamp. This should give you efficient queries for the most common use cases:

Which messages did a given user receive, sorted by date?
Which messages did a given user send, sorted by date?
And your query ;)

This gives you the most bang with the least overhead (one field in one index).

Given the indices

> db.indextest.getIndices()
[
  {
        "v" : 1,
        "key" : {
            "_id" : 1
        },
        "name" : "_id_",
        "ns" : "test.indextest"
    },
    {
        "v" : 1,
        "key" : {
            "recipient" : 1,
            "timestamp" : 1
        },
        "name" : "recipient_1_timestamp_1",
        "ns" : "test.indextest"
    },
    {
        "v" : 1,
        "key" : {
            "sender" : 1,
            "timestamp" : 1
        },
        "name" : "sender_1_timestamp_1",
        "ns" : "test.indextest"
    }
]

The result

Running your query:

> db.indextest.find({$or:[{sender:"[email protected]"},{recipient:"[email protected]"}]}).sort({timestamp:1}).explain()

gives the expected result (edited for brevity):

> db.indextest.find({$or:[{sender:"[email protected]"},{recipient:"[email protected]"}]}).sort({timestamp:1}).explain()
{
    "queryPlanner" : {
        "plannerVersion" : 1,
        "namespace" : "test.indextest",
        "indexFilterSet" : false,
...
        "winningPlan" : {
            "stage" : "SUBPLAN",
            "inputStage" : {
                "stage" : "FETCH",
                "inputStage" : {
                    "stage" : "SORT_MERGE",
                    "sortPattern" : {
                        "timestamp" : 1
                    },
                    "inputStages" : [
                        {
                            "stage" : "IXSCAN",
                            "keyPattern" : {
                                "recipient" : 1,
                                "timestamp" : 1
                            },
                            "indexName" : "recipient_1_timestamp_1",
                            "isMultiKey" : false,
...
                            "direction" : "forward",
                            "indexBounds" : {
                                "recipient" : [
                                    "[\"[email protected]\", \"[email protected]\"]"
                                ],
                                "timestamp" : [
                                    "[MinKey, MaxKey]"
                                ]
                            }
                        },
                        {
                            "stage" : "IXSCAN",
                            "keyPattern" : {
                                "sender" : 1,
                                "timestamp" : 1
                            },
                            "indexName" : "sender_1_timestamp_1",
                            "isMultiKey" : false,
...
                            "direction" : "forward",
                            "indexBounds" : {
                                "sender" : [
                                    "[\"[email protected]\", \"[email protected]\"]"
                                ],
                                "timestamp" : [
                                    "[MinKey, MaxKey]"
                                ]
                            }
                        }
                    ]
                }
            }
        },
        "rejectedPlans" : [ ]
    },
...
    "ok" : 1
}

EDIT: Depending on your collection size, a sort merge might not be ideal.

Seems to have worked! Thanks! Just had to upgrade to the latest version of MongoDb to allow index intersection.

Collectives™ on Stack Overflow

How can I use indexes with sort on a MongoDb $or query?

1 Answer 1

The data

The indices

The result

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

The data

The indices

The result

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related