Select Group by count and distinct count in same mongodb query

Question

I am trying to do something like

select campaign_id,campaign_name,count(subscriber_id),count(distinct subscriber_id)
group by campaign_id,campaign_name from campaigns;

This query giving results except count(distinct subscriber_id)

db.campaigns.aggregate([
    {$match: {subscriber_id: {$ne: null}}},
    {$group: { 
        _id: {campaign_id: "$campaign_id",campaign_name: "$campaign_name"},
        count: {$sum: 1}
    }}
])

This following query giving results except count(subscriber_id)

db.campaigns_logs.aggregate([
    {$match : {subscriber_id: {$ne: null}}},
    {$group : { _id: {campaign_id: "$campaign_id",campaign_name: "$campaign_name",subscriber_id: "$subscriber_id"}}},
    {$group : { _id: {campaign_id: "$campaign_id",campaign_name: "$campaign_name"}, 
                count: {$sum: 1}
              }}
])

but I want count(subscriber_id),count(distinct subscriber_id) in the same result

Talha Awan · Accepted Answer · 2017-04-28 08:23:18Z

You are beginning to think along the right lines here as you were headed in the right direction. Changing your SQL mindset, "distinct" is really just another way of writing a $group operation in either language. That means you have two group operations happening here and, in aggregation pipeline terms, two pipeline stages.

Just with simplified documents to visualize:

{
    "campaign_id": "A",
    "campaign_name": "A",
    "subscriber_id": "123"
},
{
    "campaign_id": "A",
    "campaign_name": "A",
    "subscriber_id": "123"
},
{
    "campaign_id": "A",
    "campaign_name": "A",
    "subscriber_id": "456"
}

It stands to reason that for the given "campaign" combination the total count and "distinct" count are "3" and "2" respectively. So the logical thing to do is "group" up all of those "subscriber_id" values first and keep the count of occurrences for each, then while thinking "pipeline", "total" those counts per "campaign" and then just count the "distinct" occurrences as a separate number:

db.campaigns.aggregate([
    { "$match": { "subscriber_id": { "$ne": null }}},

    // Count all occurrences
    { "$group": {
        "_id": {
            "campaign_id": "$campaign_id",
            "campaign_name": "$campaign_name",
            "subscriber_id": "$subscriber_id"
        },
        "count": { "$sum": 1 }
    }},

    // Sum all occurrences and count distinct
    { "$group": {
        "_id": {
            "campaign_id": "$_id.campaign_id",
            "campaign_name": "$_id.campaign_name"
        },
        "totalCount": { "$sum": "$count" },
        "distinctCount": { "$sum": 1 }
    }}
])

After the first "group" the output documents can be visualized like this:

{ 
    "_id" : { 
        "campaign_id" : "A", 
        "campaign_name" : "A", 
        "subscriber_id" : "456"
    }, 
    "count" : 1 
}
{ 
    "_id" : { 
        "campaign_id" : "A", 
        "campaign_name" : "A", 
        "subscriber_id" : "123"
    }, 
    "count" : 2
}

So from the "three" documents in the sample, "2" belong to one distinct value and "1" to another. This can still be totaled with $sum in order to get the total matching documents which you do in the following stage, with the final result:

{ 
    "_id" : { 
        "campaign_id" : "A", 
        "campaign_name" : "A"
    },
    "totalCount" : 3,
    "distinctCount" : 2
}

A really good analogy for the aggregation pipeline is the unix pipe "|" operator, which allows "chaining" of operations so you can pass the output of one command through to the input of the next, and so on. Starting to think of your processing requirements in that way will help you understand operations with the aggregation pipeline better.

Yeah, I've been using aggregate for a while and never understood it until now. I'd just avoided the complexity of the pipe aspect of pipelines altogether, because mongoDB's docs are pretty confusing on this and need your example!

goto · Accepted Answer · 2017-02-22 10:05:57Z

29

SQL Query: (group by & count of distinct)

select city,count(distinct(emailId)) from TransactionDetails group by city;

The equivalent mongo query would look like this:

db.TransactionDetails.aggregate([ 
{$group:{_id:{"CITY" : "$cityName"},uniqueCount: {$addToSet: "$emailId"}}},
{$project:{"CITY":1,uniqueCustomerCount:{$size:"$uniqueCount"}} } 
]);

edited Feb 22, 2017 at 10:05

goto

8,21210 gold badges50 silver badges60 bronze badges

answered Feb 22, 2017 at 9:46

Surendranath Reddy K

6917 silver badges6 bronze badges

2 Comments

fballer87 Over a year ago

Is there a way to sort this from high to low?

Klops Over a year ago

add a third line in the aggregation list: {"$sort": {"sort_key": 1}}. key is your key name and 1 (or -1) is the sort order

Collectives™ on Stack Overflow

Select Group by count and distinct count in same mongodb query

2 Answers 2

1 Comment

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related