0

I've got a collection mailCollection with indexes on the .sender and .recipient values. When I do an $or query and sort by .timestamp the entire collection is scanned. How can I index this collection or rewrite the query to grab the documents where .sender or .recipient match a specific value, sorted and limited?

mailCollection indexes:

{recipient: 1}
{sender: 1}

Slow code:

email = <some email address I want to query>;
cursor = mailCollection.find({$or: [{sender: email}, {recipient: email}]});
cursor.sort({timestamp: 1}).limit(100).toArray(function(error, result) {
  //yikes, full collection scan
});

If it makes a difference, I'm using the MongoDb node.js driver.

1
  • Have you tried actually adding timestamp as an index? Have you tried adding timestamp as a secondary field in a compound index to either of the existing indexes? Perhaps read Sort and Non-prefix Subset of an Index add indexes and run with explain(). Use hint() if needed and then remove indexes that would not be picked up. No the driver does not make a difference. Commented Oct 28, 2017 at 4:03

1 Answer 1

1

Dodgy. A 3-way index intersection does not work. So you need to intersect with a compound index. However, you need to make sure that you have a proper prefix in your compound index so that you have a proper use for it and make it efficient.

The data

> db.indextest.find()
{ "_id" : ObjectId("59f401893e9fcadcbf2b1694"), "sender" : "[email protected]", "recipient" : "[email protected]", "timestamp" : ISODate("2017-10-28T04:03:21.468Z") }
{ "_id" : ObjectId("59f405d93e9fcadcbf2b1695"), "sender" : "[email protected]", "recipient" : "[email protected]", "timestamp" : ISODate("2017-10-28T04:21:45.573Z") }
{ "_id" : ObjectId("59f408413e9fcadcbf2b1699"), "sender" : "[email protected]", "recipient" : "[email protected]", "timestamp" : ISODate("2017-10-28T04:32:01.651Z") }
{ "_id" : ObjectId("59f408563e9fcadcbf2b169a"), "sender" : "[email protected]", "recipient" : "[email protected]", "timestamp" : ISODate("2017-10-28T04:32:22.376Z") }
{ "_id" : ObjectId("59f408763e9fcadcbf2b169b"), "sender" : "[email protected]", "recipient" : "[email protected]", "timestamp" : ISODate("2017-10-28T04:32:54.268Z") }
{ "_id" : ObjectId("59f4087e3e9fcadcbf2b169c"), "sender" : "[email protected]", "recipient" : "[email protected]", "timestamp" : ISODate("2017-10-28T04:33:02.615Z") }

The indices

I decided to create an index both on sender and recipient with an additional key on timestamp. This should give you efficient queries for the most common use cases:

  • Which messages did a given user receive, sorted by date?
  • Which messages did a given user send, sorted by date?
  • And your query ;)

This gives you the most bang with the least overhead (one field in one index).

Given the indices

> db.indextest.getIndices()
[
  {
        "v" : 1,
        "key" : {
            "_id" : 1
        },
        "name" : "_id_",
        "ns" : "test.indextest"
    },
    {
        "v" : 1,
        "key" : {
            "recipient" : 1,
            "timestamp" : 1
        },
        "name" : "recipient_1_timestamp_1",
        "ns" : "test.indextest"
    },
    {
        "v" : 1,
        "key" : {
            "sender" : 1,
            "timestamp" : 1
        },
        "name" : "sender_1_timestamp_1",
        "ns" : "test.indextest"
    }
]

The result

Running your query:

> db.indextest.find({$or:[{sender:"[email protected]"},{recipient:"[email protected]"}]}).sort({timestamp:1}).explain()

gives the expected result (edited for brevity):

> db.indextest.find({$or:[{sender:"[email protected]"},{recipient:"[email protected]"}]}).sort({timestamp:1}).explain()
{
    "queryPlanner" : {
        "plannerVersion" : 1,
        "namespace" : "test.indextest",
        "indexFilterSet" : false,
...
        "winningPlan" : {
            "stage" : "SUBPLAN",
            "inputStage" : {
                "stage" : "FETCH",
                "inputStage" : {
                    "stage" : "SORT_MERGE",
                    "sortPattern" : {
                        "timestamp" : 1
                    },
                    "inputStages" : [
                        {
                            "stage" : "IXSCAN",
                            "keyPattern" : {
                                "recipient" : 1,
                                "timestamp" : 1
                            },
                            "indexName" : "recipient_1_timestamp_1",
                            "isMultiKey" : false,
...
                            "direction" : "forward",
                            "indexBounds" : {
                                "recipient" : [
                                    "[\"[email protected]\", \"[email protected]\"]"
                                ],
                                "timestamp" : [
                                    "[MinKey, MaxKey]"
                                ]
                            }
                        },
                        {
                            "stage" : "IXSCAN",
                            "keyPattern" : {
                                "sender" : 1,
                                "timestamp" : 1
                            },
                            "indexName" : "sender_1_timestamp_1",
                            "isMultiKey" : false,
...
                            "direction" : "forward",
                            "indexBounds" : {
                                "sender" : [
                                    "[\"[email protected]\", \"[email protected]\"]"
                                ],
                                "timestamp" : [
                                    "[MinKey, MaxKey]"
                                ]
                            }
                        }
                    ]
                }
            }
        },
        "rejectedPlans" : [ ]
    },
...
    "ok" : 1
}

EDIT: Depending on your collection size, a sort merge might not be ideal.

Sign up to request clarification or add additional context in comments.

1 Comment

Seems to have worked! Thanks! Just had to upgrade to the latest version of MongoDb to allow index intersection.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.