1

I have a document structure like this:

{
    "_id" : ObjectId("59d7cd63dc2c91e740afcdb"),
    "dateJoined": ISODate("2014-12-28T16:37:17.984Z"),
    "dateActivated": ISODate("2015-02-28T16:37:17.984Z"), 
    "enrolled" : [
        { "month":-10, "enrolled":'00'},
        { "month":-9, "enrolled":'00'},
        { "month":-8, "enrolled":'01'},
        //other months
        { "month":8, "enrolled":'11'},
        { "month":9, "enrolled":'11'},
        { "month":10, "enrolled":'00'}
    ]
}

"month" value in enrolled is relative to dateJoined that range from -X to +X that is pre-populated.

I would like to count number of document with enrolled value of '01' for every sub document that satisfies condition - like "5 months before activating and 2 months after activating". All sub document items must match the condition to count as 1. [Yes, it is possible to enroll before activating :)]

As the month value is not based on dateActivated, I should be able to dynamically calculate this for every document.

I am trying to use MongoDB aggregation framework but not sure how to dynamically.

db.getCollection("enrollments").aggregate(
    { $match:{ //matching condition }},
    { $project: {
            enrollments: {
                $filter: {
                    input: "$enrolled",
                    as: "enrollment",
                    cond: {
                        $eq: ['$$enrolled.enroll', '01']
                        //how can I check for month value here?
                    }
                }
            }
    }}
)
5
  • You seem to be asking to "adjust by the difference" from the value stored in dateJoined and dateActivated. Considering that these appear to be "strings" as presented in the question, then any math is basically impossible. You need these to be stored as BSON Date format at the very least. Commented Oct 11, 2017 at 21:52
  • @NeilLunn Thank you for pointing that out. That is actually a date, I changed it in the structure. Commented Oct 11, 2017 at 22:24
  • Did you? Because the dates as represented in your question are not actually valid values for BSON Dates. If you are not copying directly from the data you have then it is best to say so truthfully, as you can then be advised how to correct it. Commented Oct 11, 2017 at 22:26
  • Its the JSON structure from the RoboMongo. Please assume that they are dates, as this is something I created in a note pad from a large document I have in the DB. I can change the data types and structure up to some extent if needed. Commented Oct 11, 2017 at 22:36
  • "2015-2-28T16:37:17.984Z" is not valid. Should be "2015-02-28T16:37:17.984Z" and would be if you were directly copying data. The exact same millisecond value tells us that you actually edited an existing value rather than directly copied real data. Commented Oct 11, 2017 at 22:41

2 Answers 2

2

The general ask here is to include the range for the "month" values in consideration where it is "greater than" the -5 months "before" and "less than" the +2 months "after" as recorded within the "enrolled" array entries.

The problem is that since these values are based on "dateJoined", they need to be adjusted by the correct interval between the "dateJoined" and the "dateActivated". This makes the expression effectively:

monthsDiff = (yearActivated - yearJoined)*12 + (monthActivated - monthJoined)

where month >= ( startRange + monthsDiff ) and month <= ( endRange + monthsDiff )
and enrolled = "01"

Or logically expressed "The months between the expressed range adjusted by the number of months difference between joining and activating".

As stated in comment, the very first thing you need to to here is to store those date values as a BSON Date as opposed to their present apparent "string" values. Once that is done, you can then apply the following aggregation to calculate the difference from the supplied dates and filter the adjusted range accordingly from the array before counting:

var rangeStart = -5,
    rangeEnd = 2;

db.getCollection('enrollments').aggregate([
  { "$project": {
    "enrollments": {
      "$size": {
        "$filter": {
          "input": "$enrolled",
          "as": "e",
          "cond": {
            "$let": {
              "vars": {
                "monthsDiff": {
                  "$add": [
                    { "$multiply": [
                      { "$subtract": [
                        { "$year": "$dateActivated" },
                        { "$year": "$dateJoined" }
                      ]},
                      12
                    }},
                    { "$subtract": [
                      { "$month": "$dateActivated" },
                      { "$month": "$dateJoined" }
                    ]}
                  ]
                }
              },
              "in": {
                "$and": [
                  { "$gte": [ { "$add": [ rangeStart, "$$monthsDiff" ] }, "$$e.month" ] },
                  { "$lte": [ { "$add": [ rangeEnd, "$$monthsDiff" ] }, "$$e.month" ] },
                  { "$eq": [ "$$e.enrolled", "01" ] }
                ]
              }
            }
          } 
        }
      }
    }
  }}
])

So this applies the same $filter to the array which you were attempting, but now takes into account the adjusted values on the range of months to filter by as well.

To make this easier to read we apply $let which allows calculation of the common value obtained for $$monthsDiff as implemented in a variable. Here is where the expression explained originally is applied, using $year and $month to extract those numeric values from the dates as stored.

Using the additional mathematical operators $add, $subtract and $multiply you can calculate both the difference in months and also later apply to adjust the "range" values in the logical conditions with $gte and $lte.

Finally, because $filter emits an array of only the entries matching the conditions, in order to "count" we apply $size which returns the length of the "filtered" array, which is the "count" of matches.

Depending on your intended purpose the whole expression can also be provided in argument to $sum as a $group accumulator, if then was indeed the intention.

Sign up to request clarification or add additional context in comments.

2 Comments

Thank you so much. I think I almost got it working. I have a followup questions, I can ask it as a separate question if required. Here is the question: along with dateActivated, dateJoined i have other time points like dateFirstPaid, dateLastViewed etc. So my queries will be dynamic - like 2 months before first paid and 3 months after last viewed, how can use field names dynamically when calculating $$monthDiff? I will mark this as answer.
@user3731783 It's really not clear what you are asking in addition, which is really why you are meant to use the question space and not comments for asking your questions. Hence Ask a new Question when you have something different to ask than the question you asked for. Which of course showed a clear relation between the two dates, but the relation to your newly mentioned data is not explained sufficiently. It should be a completely new question.
1

You can try the below aggregation provided you store days instead of months.

Days diff to calculate the days between dateActivated and dateJoined offsetting the days to get the enrollement days relative to dateActivated.

Compare daysdiff against the following values.

-120-0 days when enrollment is after dateActivated

0-150 days when enrollment is before dateActivated

$or the above conditions & $and with enrolled value.

db.getCollection("enrollments").aggregate(
 {
  "$project": {
    "enrollments": {
      "$filter": {
        "input": "$enrolled",
        "as": "enrollment",
        "cond": {
          "$and": [
            {
              "$eq": [
                "$$enrollment.enrolled",
                "01"
              ]
            },
            {
              "$let": {
                "vars": {
                  "daysdiff": {
                    "$divide": [
                      {
                        "$subtract": [
                          "$dateActivated",
                          {
                            "$add": [
                              "$dateJoined",
                              {
                                "$multiply": [
                                  "$$enrollment.day",
                                  86400 * 1000
                                ]
                              }
                            ]
                          }
                        ]
                      },
                      86400 * 1000
                    ]
                  }
                },
                "in": {
                  "$or": [
                    {
                      "$and": [
                        {
                          "$lt": [
                            "$$daysdiff",
                            150
                          ]
                        },
                        {
                          "$gt": [
                            "$$daysdiff",
                            0
                          ]
                        }
                      ]
                    },
                    {
                      "$and": [
                        {
                          "$lt": [
                            "$$daysdiff",
                            0
                          ]
                        },
                        {
                          "$gt": [
                            "$$daysdiff",
                            -120
                          ]
                        }
                      ]
                    }
                  ]
                }
              }
            }
          ]
        }
      }
    }
  }
})

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.