4

I have the following document structure (simplified for this example)

{
  _id : ObjectId("sdfsdf"),
  result : [1, 3, 5, 7, 9]
},
{
  _id : ObjectId("asdref"),
  result : [2, 4, 6, 8, 10]
}

I want to get the sum of those result arrays, but not a total sum, instead a new array corresponding to the sum of the original arrays on an element basis, i.e.

result : [3, 7, 11, 15, 19]

I have searched through the myriad questions here and a few come close (e.g. this one, this one, and this one), but I can't quite get there.

I can get the sum of each array fine

aggregate(
    [
      {
        "$unwind" : "$result"
      },
      {
        "$group": {
          "_id": "$_id",
          "results" : { "$sum" : "$result"}
          }
      }
    ]
)

which gives me

[ { _id: sdfsdf, results: 25 },
  { _id: asdref, results: 30 } ]

but I can't figure out how to get the sum of each element

0

2 Answers 2

6

You can use includeArrayIndex if you have 3.2 or newer MongoDb.

Then you should change $unwind.

Your code should be like this:

.aggregate(
    [
      {
        "$unwind" :  { path: "$result", includeArrayIndex: "arrayIndex" }
      },
      {
        "$group": {
          "_id": "$arrayIndex",
          "results" : { "$sum" : "$result"}
          }
      },
      { 
        $sort: { "_id": 1}
      },
      {
        "$group":{
          "_id": null,
          "results":{"$push":"$results"}
          } 
      },
      {
        "$project": {"_id":0,"results":1}
      }
    ]
)
Sign up to request clarification or add additional context in comments.

16 Comments

Sort of works. But instead of creating a single array, it simply lists the sums one by one eg. "NumberLong(3), "results: 15"
I have updated answer for projection can you test it again ?
SyntaxError: invalid property id @(shell):11:2
Ok I fixed the syntax can you try again ?
We should add sorting before second grouping. I have added to answer.
|
1

There is an alternate approach to this, but mileage may vary on how practical it is considering that a different approach would involve using $push to create an "array of arrays" and then applying $reduce as introduced in MongoDB 3.4 to $sum those array elements into a single array result:

db.collection.aggregate([
  { "$group": {
    "_id": null,
    "result": { "$push": "$result" }
  }},
  { "$addFields": {
    "result": {
      "$reduce": {
        "input": "$result",
        "initialValue": [],
        "in": {
          "$map": {
            "input": { 
              "$zip": {
                "inputs": [ "$$this", "$$value" ],
                "useLongestLength": true
              }
            },
            "as": "el",
            "in": { "$sum": "$$el" }
          }
        }
      }
    }
  }}
])

The real trick there is in the "input" to $map we use the $zip operation which creates a transposed list of arrays "pairwise" for the two array inputs.

In a first iteration this takes the empty array as supplied to $reduce and would return the "zipped" output with consideration to the first object found as in:

[ [0,1], [0,3], [0,5], [0,7], [0,9] ]

So the useLongestLength would substitute the empty array with 0 values out to the the length of the current array and "zip" them together as above.

Processing with $map, each element is subject to $sum which "reduces" the returned results as:

[ 1, 3, 5, 7, 9 ]

On the second iteration, the next entry in the "array of arrays" would be picked up and processed by $zip along with the previous "reduced" content as:

[ [1,2], [3,4], [5,6], [7,8], [9,10] ]

Which is then subject to the $map for each element using $sum again to produce:

[ 3, 7, 11, 15, 19 ]

And since there were only two arrays pushed into the "array of arrays" that is the end of the operation, and the final result. But otherwise the $reduce would keep iterating until all array elements of the input were processed.


So in some cases this would be the more performant option and what you should be using. But it is noted that particularly when using a null for $group you are asking "every" document to $push content into an array for the result.

This could be a cause of breaking the BSON Limit in extreme cases, and therefore when aggregating positional array content over large results, it is probably best to use $unwind with the includeArrayIndex option instead.

Or indeed actually take a good look at the process, where in particular if the "positional array" in question is actually the result of some other "aggregation operation", then you should rather be looking at the previous pipeline stages that were used to create the "positional array". And then consider that if you wanted those positions "aggregated further" to new totals, then you should in fact do that "before" the positional result was obtained.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.