How to get (or aggregate) distinct keys of array in MongoDB

Question

I'm trying to get MongoDB to aggregate for me over an array with different key-value pairs, without knowing keys (Just a simple sum would be ok.)

Example docs:

{data: [{a: 3}, {b: 7}]}
{data: [{a: 5}, {c: 12}, {f: 25}]}
{data: [{f: 1}]}
{data: []}

So basically each doc (or it's array really) can have 0 or many entries, and I don't know the keys for those objects, but I want to sum and average the values over those keys.

Right now I'm just loading a bunch of docs and doing it myself in Node, but I'd like to offload that work to MongoDB.

I know I can unwind those first, but how to proceed from there? How to sum/avg/min/max the values if I don't know the keys?

Blakes Seven · Accepted Answer · 2015-06-27 03:28:54Z

If you do not know the keys or cannot make a reasonable educated guess then you are basically stuck from going any further with the aggregation framework. You could supply "all of the keys" for consideration, but I supect your acutal data looks more like this:

{ "data": [{ "film": 10 }, { "televsion": 5 },{ "boardGames": 1 }] }

So there would be little point here findin out all the "key names" and then throwing that at an aggregation statement.

For the record though, "this is why you do not structure your data storage like this". Information like "film" here should not be used as a "key" name, because it is useful "data" that could be searched upon and most importantly "indexed" in a database system.

So your data should really look like this:

{ 
    "data": [
        { "type": "film", "value": 10 },
        { "type": "televsion", "valule": 5 },
        { "type": "boardGames", "value": 1 }
    ]
}

Then the aggregation statement is simple, as are many other things:

db.collection.aggregate([
    { "$unwind": "$data" },
    { "$group": {
        "_id": null,
        "sum": { "$sum": "$data.value" },
        "avg": { "$avg": "$data.value" }
    }}
])

But since the key names are constantly changing in documents and do not have a uniform structure, then you need JavaScript processing on the server to traverse the keys, and that meand mapReduce:

db.collection.mapReduce(
    function() {
        this.data.forEach(function(data) {
            Object.keys(data).forEach(function(key) {
                emit(null,data[key]); // emit the value regardless of key name
            });
        });
    },
    function(key,values) {
        return Array.sum(values);     // Just summing for example
    },
    { "out": { "inline": 1 } }
)

And of course the JavaScript execution here will work much more slowly than the native coded operators available to the aggregation framework.

So this should be an abject lesson as to why you don not use "data" as "key names" when storing data in a database. The aggregation framework works with standard structres and is fast, falling back to JavaScript processing is more flexible, but the cost is mostly in speed and other features.

Slaps hand on forehead -- time to rewrite everything now that I realize I put 'data' in the name of a key! Thanks!

Collectives™ on Stack Overflow

How to get (or aggregate) distinct keys of array in MongoDB

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related