1

I have a set of (~35k) documents which look like this:

{
    "_id" : ObjectId("583dabfc7572394f93ac6ef2"),
    "updatedAt" : ISODate("2016-11-29T16:25:32.130Z"),
    "createdAt" : ISODate("2016-11-29T16:25:32.130Z"),
    "sourceType" : "report",
    "sourceRef" : ObjectId("583da865686e3dfbd977f059"),
    "type" : "video",
    "caption" : "lorem ipsum",
    "timestamps" : {
        "postedAt" : ISODate("2016-08-26T15:09:35.000Z"),
        "monthOfYear" : 7, // 0-based
        "dayOfWeek" : 5, // 0-based
        "hourOfDay" : 16 // 0-based
    },
    "stats" : {
        "comments" : 0,
        "likes" : 8
    },
    "user" : {
        "id" : "123456",
        "username" : "johndoe",
        "fullname" : "John",
        "picture" : ""
    },
    "images" : {
        "thumbnail" : "",
        "low" : "",
        "standard" : ""
    },
    "mentions" : [
        "janedoe"
    ],
    "tags" : [ 
        "holiday", 
        "party"
    ],
    "__v" : 0
}

I want to produce an aggregate report, which will be used to graph frequency of documents by hour of day/day of week/month of year, along with counts for mentions/tags.

{
  // Each frequency is independant from the others,
  // e.g. the total count for each frequency should
  // be ~35k.
  dayFrequency: [
    { day: 0, count: 1400 }, // Monday
    { day: 1, count: 1700 }, // Tuesday
    { day: 2, count: 1800 }, // Wednesday
    { /* etc */ },
    { day: 6, count: 1200 }  // Sunday
  ],

  monthFrequency: [
    { month: 0, count: 200 }, // January
    { month: 1, count: 250 }, // February
    { month: 2, count: 300 }, // March
    { /* etc */ },
    { month: 11, count: 150 } // December
  ],

  hourFrequency: [
    { hour: 0, count: 150 }, // 0am
    { hour: 1, count: 200 }, // 1am
    { hour: 2, count: 275 }, // 2am
    { /* etc */ },
    { hour: 23, count: 150 }, // 11pm
  ],

  mentions: {
    janedoe: 12,
    johnsmith: 11,
    peter: 54,
    /* and so on */
  },

  tags: {
    holiday: 872,
    party: 1029,
    /* and so on */
  }
}

Is this possible, and if so, how would I write it? From what I understand, as I am performing an aggregate of all matching documents, it would effectively be one group?

My code so far simply groups all matching records into one group, but I'm not sure on how to move forward.

Model.aggregate([
  { $match: { sourceType: 'report', sourceRef: '583da865686e3dfbd977f059' } },
  { $group: { 
    _id: '$sourceRef'
  }}
], (err, res) => {
  console.log(err);
  console.log(res);
})

Also acceptable would be counting frequencies as an array of counts (e.g. [ 1400, 1700, 1800, /* etc */ 1200 ]), which lead me to look at $count and a few of the other operators, however again I'm not clear on usage.

1 Answer 1

1

Currently not possible (at the time of writing) to do this with MongoDB 3.2 in a single pipeline. However, from MongoDB 3.4 and onwards, you can use the $facet operator which allows multiple aggregation pipelines to be processed within a single stage on the same set of input documents. Each sub-pipeline has its own field in the output document where its results are stored as an array of documents.

For example, the above can be implemented by running the following aggregation pipeline:

Model.aggregate([
    { "$match": { "sourceType": "report", "sourceRef": "583da865686e3dfbd977f059" } },
    {
        "$facet": {
            "dayFrequency": [
                {
                    "$group": {
                        "_id": "$timestamps.dayOfWeek",
                        "count": { "$sum": 1 }
                    }
                }
            ],
            "monthFrequency": [
                {
                    "$group": {
                        "_id": "$timestamps.monthOfYear",
                        "count": { "$sum": 1 }
                    }
                }
            ],
            "hourFrequency": [
                {
                    "$group": {
                        "_id": "$timestamps.hourOfDay",
                        "count": { "$sum": 1 }
                    }
                }
            ],
            "mentions": [
                { "$unwind": "$mentions" },
                {
                    "$group": {
                        "_id": "$mentions",
                        "count": { "$sum": 1 }
                    }
                }
            ],
            "tags": [
                { "$unwind": "$tags" },
                {
                    "$group": {
                        "_id": "$tags",
                        "count": { "$sum": 1 }
                    }
                }
            ]
        }
    }
], (err, res) => {
    console.log(err);
    console.log(res);
})
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.