13

I have an item collection with following documents.

{ "item" : "i1", "category" : "c1", "brand" : "b1" }  
{ "item" : "i2", "category" : "c2", "brand" : "b1" }  
{ "item" : "i3", "category" : "c1", "brand" : "b2" }  
{ "item" : "i4", "category" : "c2", "brand" : "b1" }  
{ "item" : "i5", "category" : "c1", "brand" : "b2" }  

I want to separate aggregation results --> count by category, count by brand. Please note, it is not count by (category,brand)

I am able to do this using map-reduce using following code.

map = function(){
    emit({type:"category",category:this.category},1);
    emit({type:"brand",brand:this.brand},1);
}
reduce = function(key, values){
    return Array.sum(values)
}
db.item.mapReduce(map,reduce,{out:{inline:1}})

And the result is

{
        "results" : [
                {
                        "_id" : {
                                "type" : "brand",
                                "brand" : "b1"
                        },
                        "value" : 3
                },
                {
                        "_id" : {
                                "type" : "brand",
                                "brand" : "b2"
                        },
                        "value" : 2
                },
                {
                        "_id" : {
                                "type" : "category",
                                "category" : "c1"
                        },
                        "value" : 3
                },
                {
                        "_id" : {
                                "type" : "category",
                                "category" : "c2"
                        },
                        "value" : 2
                }
        ],
        "timeMillis" : 21,
        "counts" : {
                "input" : 5,
                "emit" : 10,
                "reduce" : 4,
                "output" : 4
        },
        "ok" : 1,
}

I can get same results by firing two different aggregation commands as below.

db.item.aggregate({$group:{_id:"$category",count:{$sum:1}}})
db.item.aggregate({$group:{_id:"$brand",count:{$sum:1}}})

Is there anyway I can do the same using aggregation framework by single aggregation command.

I have simplified my case here, but in actual I need this grouping from fields in array of subdocuments. Assume the above is structure after I do unwind.

It is a real-time query (someone waiting for response), though on smaller dataset, so execution time is important.

I am using MongoDB 2.4.

2 Answers 2

13

Starting in Mongo 3.4, the $facet aggregation stage greatly simplifies this type of use case by processing multiple aggregation pipelines within a single stage on the same set of input documents:

// { "item" : "i1", "category" : "c1", "brand" : "b1" }
// { "item" : "i2", "category" : "c2", "brand" : "b1" }
// { "item" : "i3", "category" : "c1", "brand" : "b2" }
// { "item" : "i4", "category" : "c2", "brand" : "b1" }
// { "item" : "i5", "category" : "c1", "brand" : "b2" }
db.collection.aggregate(
  { $facet: {
      categories: [{ $group: { _id: "$category", count: { "$sum": 1 } } }],
      brands:     [{ $group: { _id: "$brand",    count: { "$sum": 1 } } }]
  }}
)
// {
//   "categories" : [
//     { "_id" : "c1", "count" : 3 },
//     { "_id" : "c2", "count" : 2 }
//   ],
//   "brands" : [
//     { "_id" : "b1", "count" : 3 },
//     { "_id" : "b2", "count" : 2 }
//   ]
// }
Sign up to request clarification or add additional context in comments.

1 Comment

This is the best way!
7

Over a large data set I would say that your current mapReduce approach would be the best one, because the aggregation technique for this would not work well with large data. But possibly over a reasonably small size it might just be what you need:

db.items.aggregate([
    { "$group": {
        "_id": null,
        "categories": { "$push": "$category" },
        "brands": { "$push": "$brand" }
    }},
    { "$project": {
        "_id": {
            "categories": "$categories",
            "brands": "$brands"
        },
        "categories": 1
    }},
    { "$unwind": "$categories" },
    { "$group": {
        "_id": {
            "brands": "$_id.brands",
            "category": "$categories"
        },
        "count": { "$sum": 1 }
    }},
    { "$group": {
        "_id": "$_id.brands",
        "categories": { "$push": {
            "category": "$_id.category",
            "count": "$count"
        }},
    }},
    { "$project": {
        "_id": "$categories",
        "brands": "$_id"
    }},
    { "$unwind": "$brands" },
    { "$group": {
        "_id": {
            "categories": "$_id",
            "brand": "$brands"
        },
        "count": { "$sum": 1 }
    }},
    { "$group": {
        "_id": null,
        "categories": { "$first": "$_id.categories" },
        "brands": { "$push": {
            "brand": "$_id.brand",
            "count": "$count"
        }}
    }}
])

Not really the same as the mapReduce output, you could throw in some more stages to change the output format, but this should be usable:

{
    "_id" : null,
    "categories" : [
            {
                    "category" : "c2",
                    "count" : 2
            },
            {
                    "category" : "c1",
                    "count" : 3
            }
    ],
    "brands" : [
            {
                    "brand" : "b2",
                    "count" : 2
            },
            {
                    "brand" : "b1",
                    "count" : 3
            }
    ]
}

As you can see, this involves a fair bit of shuffling between arrays in order to group each set of either "category" or "brand" within the same pipeline process. Again I will say, this will not do well for large data, but for something like "items in an order" it would probably do nicely.

Of course as you say, you have simplified somewhat, so the first grouping key on null is either going to be something else or either narrowed down to do that null case by an earlier $match stage, which is probably what you want to do.

3 Comments

great! Theoretically works! But 9 pipelines - not intuitive and manageable. It's like doing self-joins multiple times, memory and process intensive. On quick measurement, its taking 3 times more time than calling aggregation twice. Not right thing for my case, as my usecase requires doing this not just items in an order, but across orders in given time range and computing counts, price sum etc.,
@Poorna Yes and possibly so, but I did add that disclaimer at the start and the main issue will always be with the size, big arrays are a big performance problem. But I do also note that doing anything outside of what you actually asked for is not actually your question, is it? So if you want a real solution to your real problem, you would be better off posting a question that actually presented that instead.
I liked your solution, I couldn't think any closer to that before posting my question. I was just explaining why it doesn't suit my case.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.