0

I have a mongo collection containing documents that have sub documents that I want to sum the fields of - an example of what I am hoping to achieve is below

The general structure of each document is

{
    "pool" : "Foo",
    "monthly-figures" : {
        "1": {
            "a" : 311,
            "b" : 1481,
            ...
            "x" : {"a" : 311, "b" : 19.965999999999998},
            "y" : {"a" : 200, "b" : 14.174000000000003
            }
        },
        "2": {
            "a" : 500,
            "b" : 100,
            ...
            "x" : {"a" : 123, "b" : 198},
            "y" : {"a" : 200, "b" : 13.7}
        },
        ... // May not all be present
        "12": {...}
        }
    }
}

The reason monthly is stored as an object as opposed to an array is that some months may not be present.

Take for example the three documents

{
    "pool" : "Foo",
    "monthly-figures" : {
        "1": {
            "a" : 10,
            "b" : 20,
            ...
            "x" : {"a" : 15, "b" :30}
            }
        },
        "2": {
            "a" : 500,
            "b" : 100,
            ...
            "x" : {"a" : 40, "b" : 50},
        },
        "7": {
            "a": 300,
            "b": 90,
            ...
            "x": {"a": 4, "b": 5}
        }
    }
}

{
    "pool" : "Foo",
    "monthly-figures" : {
        "1": {
            "a" : 15,
            "b" : 25,
            ...
            "x" : {"a" : 20, "b" : 35},
        },
        "2": {
            "a" : 250,
            "b" : 200,
            ...
            "x" : {"a" : 60, "b" : 80},
        }
    }
}


{
    "pool" : "Bar",
    "monthly-figures" : {
        "1": {
            "a" : 300,
            "b" : 400,
            ...
            "x" : {"a" : 51, "b" : 3},
            }
        },
        "6": {
            "a" : 75,
            "b" : 135,
            ...
            "x" : {"a" : 12.5, "b" : 16},
        }
    }
}

What I want to achieve through aggregation is to group based on the pool field, then sum the values contained within the monthly-figures - so the resulting two documents would look like

{
    "pool" : "Foo",
    "monthly-figures" : {
        "1": {
            "a" : 25,
            "b" : 45,
            ...
            "x" : {"a" : 35, "b" : 65},
        },
        "2": {
            "a" : 750,
            "b" : 300,
            ...
            "x" : {"a" : 100, "b" : 130},
        },
        "7": {
            "a": 300,
            "b": 90,
            ...
            "x": {"a": 4, "b": 5}
        }
    }
}

(Document with pool of Bar would be identical as there is only 1)

It doesn't matter if a month has all 0 values after the aggregation (if say the month doesn't exist on any document of the grouping) but ideally it wouldn't?

I've come up with this query which does work, but I feel is not the best way to do it - lots of repetition - how can I improve?

{$group: {
        // Group to pool
        _id: "$pool",

        // Sum grouped documents
        "1a": {$sum: "$monthly-figures.1.a"},
        "1b": {$sum: "$monthly-figures.1.b"},
        ...
        "1xa": {$sum: "$monthly-figures.1.x.a"},
        "1xb": {$sum: "$monthly-figures.1.x.b"},


        "2a": {$sum: "$monthly-figures.2.a"},


        ... Continue all the way down to 12
    }
},
{$project: {
        "_id": 0,
        "pool": "$_id",

        "monthly-figures": {

            "1": {
                "a": "$1a",
                "b": "$1b",
                ...
                "x": {
                    "a": "$1xa",
                    "b": "$1xb"
                }
            },
            "2": {
                "a": "$2a",
                ...
            }

            ... Continue all the way down to 12
        }
    }
}

Any ideas on a cleaner pipeline? Cheers!

1 Answer 1

1

One way to eliminate the need to list each month is to turn the monthly-figures object into an array. You said:

The reason monthly is stored as an object as opposed to an array is that some months may not be present.

But an array would still work because you could have:

[{month: 1, figures: {...}}, {month: 6, figures: {...}}]

With monthly-figures an array, we can $unwind each element of the array into its own document. Now it's possible to do a $group on both the pool and month to get the sums. To gather up the months for each pool we can do another $group on just the pool and $push specially formed objects, containing the month and figures, onto an array called monthly-figures. Those objects are special because the k and v keys are recognized by the $arrayToObject operator that's used in the next stage to get back the original form. Here's the query:

db.colx.aggregate([{
    "$project": {
        "pool": 1,
        "monthly-figures": {"$objectToArray": "$monthly-figures"}
    }
}, {
    "$unwind": "$monthly-figures"
}, {
    "$group": {
        "_id": {
            "pool": "$pool",
            "month": "$monthly-figures.k",      
        },
        "a": {"$sum": "$monthly-figures.v.a"},
        "b": {"$sum": "$monthly-figures.v.b"},
        "x_a": {"$sum": "$monthly-figures.v.x.a"},
        "x_b": {"$sum": "$monthly-figures.v.x.b"}
    }
}, {
    "$group": {
        "_id": "$_id.pool",
        "monthly-figures": {
            "$push": {
                "k": "$_id.month",
                "v": {
                    "a": "$a",
                    "b": "$b",
                    "x": {
                        "a": "$x_a",
                        "b": "$x_b"
                    }
                }
            }
        }
    }
}, {
    "$project": {
        "_id": 0,
        "pool": "$_id",
        "monthly-figures": {"$arrayToObject": "$monthly-figures"}
    }
}])

Here's the output from the query:

{
    "pool" : "Bar",
    "monthly-figures" : {
        "6" : {
            "a" : 75,
            "b" : 135,
            "x" : {
                "a" : 12.5,
                "b" : 16
            }
        },
        "1" : {
            "a" : 300,
            "b" : 400,
            "x" : {
                "a" : 51,
                "b" : 3
            }
        }
    }
}
{
    "pool" : "Foo",
    "monthly-figures" : {
        "1" : {
            "a" : 25,
            "b" : 45,
            "x" : {
                "a" : 35,
                "b" : 65
            }
        },
        "2" : {
            "a" : 750,
            "b" : 300,
            "x" : {
                "a" : 100,
                "b" : 130
            }
        },
        "7" : {
            "a" : 300,
            "b" : 90,
            "x" : {
                "a" : 4,
                "b" : 5
            }
        }
    }
}

Links to MongoDB Documentation:

Sign up to request clarification or add additional context in comments.

1 Comment

Brilliant. I did wonder about using $objectToArray but couldn't think how it would fit into the pipeline. Has helped my understanding a lot - thanks!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.