mongodb multiple aggregations in single operation

Question

I have an item collection with following documents.

{ "item" : "i1", "category" : "c1", "brand" : "b1" }  
{ "item" : "i2", "category" : "c2", "brand" : "b1" }  
{ "item" : "i3", "category" : "c1", "brand" : "b2" }  
{ "item" : "i4", "category" : "c2", "brand" : "b1" }  
{ "item" : "i5", "category" : "c1", "brand" : "b2" }

I want to separate aggregation results --> count by category, count by brand. Please note, it is not count by (category,brand)

I am able to do this using map-reduce using following code.

map = function(){
    emit({type:"category",category:this.category},1);
    emit({type:"brand",brand:this.brand},1);
}
reduce = function(key, values){
    return Array.sum(values)
}
db.item.mapReduce(map,reduce,{out:{inline:1}})

And the result is

{
        "results" : [
                {
                        "_id" : {
                                "type" : "brand",
                                "brand" : "b1"
                        },
                        "value" : 3
                },
                {
                        "_id" : {
                                "type" : "brand",
                                "brand" : "b2"
                        },
                        "value" : 2
                },
                {
                        "_id" : {
                                "type" : "category",
                                "category" : "c1"
                        },
                        "value" : 3
                },
                {
                        "_id" : {
                                "type" : "category",
                                "category" : "c2"
                        },
                        "value" : 2
                }
        ],
        "timeMillis" : 21,
        "counts" : {
                "input" : 5,
                "emit" : 10,
                "reduce" : 4,
                "output" : 4
        },
        "ok" : 1,
}

I can get same results by firing two different aggregation commands as below.

db.item.aggregate({$group:{_id:"$category",count:{$sum:1}}})
db.item.aggregate({$group:{_id:"$brand",count:{$sum:1}}})

Is there anyway I can do the same using aggregation framework by single aggregation command.

I have simplified my case here, but in actual I need this grouping from fields in array of subdocuments. Assume the above is structure after I do unwind.

It is a real-time query (someone waiting for response), though on smaller dataset, so execution time is important.

I am using MongoDB 2.4.

Xavier Guihot · Accepted Answer · 2020-04-05 08:04:48Z

13

Starting in Mongo 3.4, the $facet aggregation stage greatly simplifies this type of use case by processing multiple aggregation pipelines within a single stage on the same set of input documents:

// { "item" : "i1", "category" : "c1", "brand" : "b1" }
// { "item" : "i2", "category" : "c2", "brand" : "b1" }
// { "item" : "i3", "category" : "c1", "brand" : "b2" }
// { "item" : "i4", "category" : "c2", "brand" : "b1" }
// { "item" : "i5", "category" : "c1", "brand" : "b2" }
db.collection.aggregate(
  { $facet: {
      categories: [{ $group: { _id: "$category", count: { "$sum": 1 } } }],
      brands:     [{ $group: { _id: "$brand",    count: { "$sum": 1 } } }]
  }}
)
// {
//   "categories" : [
//     { "_id" : "c1", "count" : 3 },
//     { "_id" : "c2", "count" : 2 }
//   ],
//   "brands" : [
//     { "_id" : "b1", "count" : 3 },
//     { "_id" : "b2", "count" : 2 }
//   ]
// }

answered Apr 5, 2020 at 8:04

Xavier Guihot

62.8k26 gold badges320 silver badges202 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Rohit Parte Over a year ago

This is the best way!

Neil Lunn · Accepted Answer · 2014-04-30 10:09:10Z

7

Over a large data set I would say that your current mapReduce approach would be the best one, because the aggregation technique for this would not work well with large data. But possibly over a reasonably small size it might just be what you need:

db.items.aggregate([
    { "$group": {
        "_id": null,
        "categories": { "$push": "$category" },
        "brands": { "$push": "$brand" }
    }},
    { "$project": {
        "_id": {
            "categories": "$categories",
            "brands": "$brands"
        },
        "categories": 1
    }},
    { "$unwind": "$categories" },
    { "$group": {
        "_id": {
            "brands": "$_id.brands",
            "category": "$categories"
        },
        "count": { "$sum": 1 }
    }},
    { "$group": {
        "_id": "$_id.brands",
        "categories": { "$push": {
            "category": "$_id.category",
            "count": "$count"
        }},
    }},
    { "$project": {
        "_id": "$categories",
        "brands": "$_id"
    }},
    { "$unwind": "$brands" },
    { "$group": {
        "_id": {
            "categories": "$_id",
            "brand": "$brands"
        },
        "count": { "$sum": 1 }
    }},
    { "$group": {
        "_id": null,
        "categories": { "$first": "$_id.categories" },
        "brands": { "$push": {
            "brand": "$_id.brand",
            "count": "$count"
        }}
    }}
])

Not really the same as the mapReduce output, you could throw in some more stages to change the output format, but this should be usable:

{
    "_id" : null,
    "categories" : [
            {
                    "category" : "c2",
                    "count" : 2
            },
            {
                    "category" : "c1",
                    "count" : 3
            }
    ],
    "brands" : [
            {
                    "brand" : "b2",
                    "count" : 2
            },
            {
                    "brand" : "b1",
                    "count" : 3
            }
    ]
}

As you can see, this involves a fair bit of shuffling between arrays in order to group each set of either "category" or "brand" within the same pipeline process. Again I will say, this will not do well for large data, but for something like "items in an order" it would probably do nicely.

Of course as you say, you have simplified somewhat, so the first grouping key on null is either going to be something else or either narrowed down to do that null case by an earlier $match stage, which is probably what you want to do.

answered Apr 30, 2014 at 10:09

Neil Lunn

151k36 gold badges356 silver badges327 bronze badges

3 Comments

Poorna Subhash Over a year ago

great! Theoretically works! But 9 pipelines - not intuitive and manageable. It's like doing self-joins multiple times, memory and process intensive. On quick measurement, its taking 3 times more time than calling aggregation twice. Not right thing for my case, as my usecase requires doing this not just items in an order, but across orders in given time range and computing counts, price sum etc.,

Neil Lunn Over a year ago

@Poorna Yes and possibly so, but I did add that disclaimer at the start and the main issue will always be with the size, big arrays are a big performance problem. But I do also note that doing anything outside of what you actually asked for is not actually your question, is it? So if you want a real solution to your real problem, you would be better off posting a question that actually presented that instead.

Poorna Subhash Over a year ago

I liked your solution, I couldn't think any closer to that before posting my question. I was just explaining why it doesn't suit my case.

Collectives™ on Stack Overflow

mongodb multiple aggregations in single operation

2 Answers 2

1 Comment

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related