2

I'm trying to get my head around an aggregate pipeline in MongoDb with multiple groups.

I have the following data: https://gist.github.com/bomortensen/36e6b3fbc987a096be36a66bbfe30d82

Expected data would be: https://gist.github.com/bomortensen/7b220df1f1da83be838acfb2ed79a2ee (total quantity sum based on highest version, hourly)

I need to write a query which does the following:

  1. Group the data by the field MeterId to get unique meter groups.
  2. In each group I then need to group by the StartDate's year, month, day and hour since all objects StartDate is stored as quarters, but I need to aggregate them into whole hours.
    1. Finally, I need to only get the highest version from the Versions array by VersionNumber

I've tried the following query, but must admit I'm stuck:

mycollection.aggregate([        
    { $group: { 
            _id : { ediel: "$_id.MeterId", start: "$_id.StartDate" },
            versions: { $push: "$Versions" }            
        } 
    },
    { $unwind: { path: "$versions" } },
    { $group: {
            _id: {
                hour: { $hour: "$_id.start.DateTime" },
                key: "$_id"                                
            },              
            quantitySum: { $sum: "$Versions.Quantity" }          
         } 
    },
    { $sort: { "_id.hour": -1 } }
]);

Does anyone know how I should do this? :-)

2
  • add the expected output please Commented Oct 31, 2016 at 13:24
  • Added expected output to my post, @Veeram :-) Commented Oct 31, 2016 at 13:35

2 Answers 2

2

This would give :

  • 1 $project : get $hour from date, create a maxVersion field per record
  • 1 $unwind to remove the Versions array
  • 1 $project to add a keep field that will contain a boolean to check if the record should be kept or not
  • 1 $match that match only higher version number eg keep == true
  • 1 $group that group by id/hour and sum the quantity
  • 1 $project to set up your required format

Query is :

db.mycollection.aggregate([{
    $project: {
        _id: 1,
        Versions: 1,
        hour: {
            "$hour": "$_id.StartDate"
        },
        maxVersion: { $max: "$Versions.VersionNumber" }
    }
}, {
    $unwind: "$Versions"
}, {
    $project: {
        _id: 1,
        Versions: 1,
        hour: 1,
        maxVersion: 1,
        keep: { $eq: ["$Versions.VersionNumber", "$maxVersion"] }
    }
}, {
    $match: { "keep": true }

}, {
    $group: {
        _id: { _id: "$_id.MeterId", hour: "$hour" },
        StartDate: { $first: "$_id.StartDate" },
        QuantitySum: { $sum: "$Versions.Quantity" }
    }
}, {
    $project: {
        _id: { _id: "$_id._id", StartDate: "$StartDate" },
        hour: "$_id.hour",
        QuantitySum: 1
    }
}])

In your example output you take into account only the first higher versionNumber, You have { "VersionNumber" : 2, "Quantity" : 7.5 } and { "VersionNumber" : 2, "Quantity" : 8.4 } for hour 2 and id 1234 but you only take { "VersionNumber" : 2, "Quantity" : 7.5 }

I dont know if this is intended or not but in this case you want to take only the first MaxVersion number. After the $match, I added :

  • 1 $group that push versions previously filter in an array
  • 1 $project that $slice this array to take only the first element
  • 1 $unwind to remove this array (which contains only one elemement)

The query that match your output is :

db.mycollection.aggregate([{
    $project: {
        _id: 1,
        Versions: 1,
        hour: {
            "$hour": "$_id.StartDate"
        },
        maxVersion: { $max: "$Versions.VersionNumber" }
    }
}, {
    $unwind: "$Versions"
}, {
    $project: {
        _id: 1,
        Versions: 1,
        hour: 1,
        maxVersion: 1,
        keep: { $eq: ["$Versions.VersionNumber", "$maxVersion"] }
    }
}, {
    $match: { "keep": true }

}, {
    $group: {
        _id: { _id: "$_id.MeterId", StartDate: "$_id.StartDate" },
        Versions: { $push: "$Versions" },
        hour: { "$first": "$hour" }
    }
}, {
    $project: {
        _id: 1,
        hour: 1,
        Versions: { $slice: ["$Versions", 1] }
    }
}, {
    $unwind: "$Versions"
}, {
    $sort: {
        _id: 1
    }
}, {
    $group: {
        _id: { _id: "$_id._id", hour: "$hour" },
        StartDate: { $first: "$_id.StartDate" },
        QuantitySum: { $sum: "$Versions.Quantity" }
    }
}, {
    $project: {
        _id: { _id: "$MeterId._id", StartDate: "$StartDate" },
        Hour: "$_id.hour",
        QuantitySum: 1
    }
}])

Output is :

{ "_id" : { "MeterId" : "4567", "StartDate" : ISODate("2016-09-20T03:00:00Z") }, "QuantitySum" : 25.9, "Hour" : 3 }
{ "_id" : { "MeterId" : "4567", "StartDate" : ISODate("2016-09-20T02:00:00Z") }, "QuantitySum" : 25.9, "Hour" : 2 }
{ "_id" : { "MeterId" : "1234", "StartDate" : ISODate("2016-09-20T03:00:00Z") }, "QuantitySum" : 25.9, "Hour" : 3 }
{ "_id" : { "MeterId" : "1234", "StartDate" : ISODate("2016-09-20T02:00:00Z") }, "QuantitySum" : 25.9, "Hour" : 2 }
Sign up to request clarification or add additional context in comments.

Comments

1

Sorry, I just dont find a straight forward way to round the hour. You can try the following. You will unwind the versions, so you can apply the grouping to collect the max version, push the versions for the next step, which is to project to filter the matching records with max version and final project to sum the max versions quantity. Right now start dt is the min from the group. You should be fine as long as you have versions at the top of the hour.

db.collection.aggregate([{
    $unwind: {
        path: "$Versions"
    }
}, {
    $group: {
        _id: {
            MeterId: "$_id.MeterId",
            start: {
                $hour: "$_id.StartDate"
            }
        },
        startDate: {
            $min: "$_id.StartDate"
        },
        maxVersion: {
            $max: "$Versions.VersionNumber"
        },
        Versions: {
            $push: "$Versions"
        }
    }
}, {
    $sort: {
        "_id.start": -1
    }
}, {
    $project: {
        _id: {
            MeterId: "$_id.MeterId",
            StartDate: "$startDate"
        },
        hour: "$_id.start",
        Versions: {
            $filter: {
                input: "$Versions",
                as: "version",
                cond: {
                    $eq: ["$maxVersion", "$$version.VersionNumber"]
                }
            }
        }
    }
}, {
    $project: {
        _id: 1,
        hour: 1,
        QuantitySum: {
            $sum: "$Versions.Quantity"
        }
    }
}]);

Sample Output

{
    "_id": {
        "MeterId": "1234",
        "StartDate": ISODate("2016-09-20T02:00:00Z")
    },
    "QuantitySum": 15,
    "hour": 2
}

1 Comment

Thank you so much for your answer, Veeram! Greatly appreciated :-) It all makes sence, but one question remains: what if I only want to get the quantity sum from only the highest version in the Versions array?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.