1

Situation: I have collection with huge amount of documents after map reduce(aggregation). Documents in the collection looks like this:

/* 0 */
{
    "_id" : {
        "appId" : ObjectId("1"),
        "timestamp" : ISODate("2014-04-12T00:00:00.000Z"),
        "name" : "GameApp",
        "user" : "[email protected]",
        "type" : "game"
    },
    "value" : {
        "count" : 2
    }
}

/* 1 */
{
    "_id" : {
        "appId" : ObjectId("2"),
        "timestamp" : ISODate("2014-04-29T00:00:00.000Z"),
        "name" : "ScannerApp",
        "user" : "[email protected]",
        "type" : "game"
    },
    "value" : {
        "count" : 5
    }
}

...

And I searching inside this collection with aggregation framework:

db.myCollection.aggregate([match, project, group, sort, skip, limit]); // aggregation can return result on Daily or Monthly time base depends of user search criteria, with pagination etc...

Possible search criteria:

1. {appId, timestamp, name, user, type} 
2. {appId, timestamp}
3. {name, user}

I'm getting correct result, exactly what I need. But from optimisation point of view I have doubts about indexing.

Questions:

  1. Is it possible to create indexes for such collection?
  2. How I can create indexes for such object with complex _id field?
  3. How I can do analog of db.collection.find().explain() to verify which index used?
  4. And is good idea to index such collection or its my performance paranoia?

Answer summarisation:

  • MongoDB creates index by _id field automatically but that is useless in a case of complex _id field like in an example. For field like: _id: {name: "", timestamp: ""} you must use index like that: *.ensureIndex({"_id.name": 1, "_id.timestamp": 1}) only after that your collection will be indexed in proper way by _id field.
  • For tracking how your indexes works with Mongo Aggregation you can not use db.myCollection.aggregate().explain() and proper way of doing that is:

db.runCommand({ aggregate: "collection_name", pipeline: [match, proj, group, sort, skip, limit], explain: true })
  • My testing on local computer sows that such indexing seems to be good idea. But this is require more testing with big collections.

1 Answer 1

1

First, indexes 1 and 3 are probably worth investigating. As for explain, you can pass explain as an option to your pipeline. You can find docs here and an example here

Sign up to request clarification or add additional context in comments.

5 Comments

I saw this doc but db.myCollection.aggregate([match, project, group, sort, skip, limit], {explain : true}) gives me cursor without info about indexes... But how I can create indexes for this collection?
There's nothing special about those indexes. You can find how to create them here.
if I will create indexes for collection like that: *.ensureIndex({"_id. appId": 1, "_id.timestamp": 1}) does it take effect with db.myCollection.aggregate()?
Yes. Aggregation will try to use the indexes you've defined on a collection the same way a query will.
Further to @evanchooly response - your #2 is really a subset of #1 and does not need to be an independent index.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.