1

I have a database containing some large objects, with always the same keys/structure:

{
  "stats": {
    "a": 100
    "b": 0
    "c": 30
    "d": 20
    ...
    "z": 100
  }
},
{
  "stats": {
    "a": 200
    "b": 2
    "c": 10
    "d": 40
    ...
    "z": 100
  }
}

I would like to know if there is a way to aggregate all stats sub-object without specifying all their fields using PyMongo. The desired output would be this:

"stats": {
  "a": 150
  "b": 1
  "c": 20
  "d": 30
  ...
  "z": 100
}

I found this: Mongodb Is it possible to aggregate an object? but I am kinda unsure of how to use it in PyMongo.

EDIT: I could list all fields and aggregate them, but I am looking for a solution not listing those fields (I have approximately 100 of them).

2
  • I want to confirm that you want to get average of values by each key, is it? Commented Apr 2, 2017 at 16:00
  • Yes, that's exactly what I am looking for Commented Apr 2, 2017 at 23:11

1 Answer 1

1

There is not a built-in to do what you're asking, at least not that I am aware of.

One thing that you could do is dynamically build the pipeline in Python. Since every document has the same fields, you could do a find_one and use that to get the set of fields and build an aggregation pipeline from that.

For example:

import pprint
from pymongo import MongoClient
client = MongoClient()

pp = pprint.PrettyPrinter(indent=4)
db = MongoClient().test
collection = db.foo

pipeline = [{
    '$group': {
        '_id' : None
     }
}]

group = pipeline[0]['$group']

doc = collection.find_one()

for k in doc['stats']:
    group[k] = {'$avg' : '$stats.'+k}


pp.pprint(pipeline)

cursor = collection.aggregate(pipeline, allowDiskUse=True)

for doc in cursor:
    pp.pprint(doc)

Output:

[   {   '$group': {   '_id': None,
                      u'a': {   '$avg': u'$stats.a'},
                      u'b': {   '$avg': u'$stats.b'},
                      u'c': {   '$avg': u'$stats.c'},
                      u'd': {   '$avg': u'$stats.d'},
                      u'z': {   '$avg': u'$stats.z'}}}]
{   u'_id': None, u'a': 150.0, u'b': 1.0, u'c': 20.0, u'd': 30.0, u'z': 100.0}
Sign up to request clarification or add additional context in comments.

2 Comments

Isn't there a way by using a map-reduce function? I can't figure how I could write it, but this solution is a pretty okay start.
You could probably do this with map reduce, but the aggregation framework is going to perform better. map reduce should be a last resort.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.