Mongo and Java: Create indexes for aggregation framework

Question

Situation: I have collection with huge amount of documents after map reduce(aggregation). Documents in the collection looks like this:

/* 0 */
{
    "_id" : {
        "appId" : ObjectId("1"),
        "timestamp" : ISODate("2014-04-12T00:00:00.000Z"),
        "name" : "GameApp",
        "user" : "[email protected]",
        "type" : "game"
    },
    "value" : {
        "count" : 2
    }
}

/* 1 */
{
    "_id" : {
        "appId" : ObjectId("2"),
        "timestamp" : ISODate("2014-04-29T00:00:00.000Z"),
        "name" : "ScannerApp",
        "user" : "[email protected]",
        "type" : "game"
    },
    "value" : {
        "count" : 5
    }
}

...

And I searching inside this collection with aggregation framework:

db.myCollection.aggregate([match, project, group, sort, skip, limit]); // aggregation can return result on Daily or Monthly time base depends of user search criteria, with pagination etc...

Possible search criteria:

1. {appId, timestamp, name, user, type} 
2. {appId, timestamp}
3. {name, user}

I'm getting correct result, exactly what I need. But from optimisation point of view I have doubts about indexing.

Questions:

Is it possible to create indexes for such collection?
How I can create indexes for such object with complex _id field?
How I can do analog of db.collection.find().explain() to verify which index used?
And is good idea to index such collection or its my performance paranoia?

Answer summarisation:

MongoDB creates index by _id field automatically but that is useless in a case of complex _id field like in an example. For field like: _id: {name: "", timestamp: ""} you must use index like that: *.ensureIndex({"_id.name": 1, "_id.timestamp": 1}) only after that your collection will be indexed in proper way by _id field.
For tracking how your indexes works with Mongo Aggregation you can not use db.myCollection.aggregate().explain() and proper way of doing that is:



db.runCommand({ 
        aggregate: "collection_name",
        pipeline: [match, proj, group, sort, skip, limit],
        explain: true 
    })

My testing on local computer sows that such indexing seems to be good idea. But this is require more testing with big collections.

evanchooly · Accepted Answer · 2014-04-30 13:40:40Z

1

First, indexes 1 and 3 are probably worth investigating. As for explain, you can pass explain as an option to your pipeline. You can find docs here and an example here

answered Apr 30, 2014 at 13:40

evanchooly

6,2761 gold badge18 silver badges24 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Lugaru Over a year ago

I saw this doc but db.myCollection.aggregate([match, project, group, sort, skip, limit], {explain : true}) gives me cursor without info about indexes... But how I can create indexes for this collection?

evanchooly Over a year ago

There's nothing special about those indexes. You can find how to create them here.

Lugaru Over a year ago

if I will create indexes for collection like that: *.ensureIndex({"_id. appId": 1, "_id.timestamp": 1}) does it take effect with db.myCollection.aggregate()?

evanchooly Over a year ago

Yes. Aggregation will try to use the indexes you've defined on a collection the same way a query will.

aks Over a year ago

Further to @evanchooly response - your #2 is really a subset of #1 and does not need to be an independent index.

Collectives™ on Stack Overflow

Mongo and Java: Create indexes for aggregation framework

1 Answer 1

5 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

5 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related