3

The MongoDb of my website stores a single document for each user. Each user will answer a couple of questionnaire forms during his visit. The forms are stored in an array, but since the documents don't overlap, a flat, single document would suffice. For analysis, I wish to produce a flat table of all the answers over all the forms.

Consider the following data structure:

{
    "USER_SESSION_ID": 456,
    "forms": [
        {
            "age": 21,
            "gender": "m"
        },
        {
            "job": "Student",
            "years_on_job": "12"
        },
        {
            "Hobby": "Hiking",
            "Twitter": "@my_account"
        }
    ]
},
{
    "USER_SESSION_ID": 678,
    "forms": [
        {
            "age": 46,
            "gender": "f"
        },
        {
            "job": "Bodyguard",
            "years_on_job": "2"
        },
        {
            "Hobby": "Skiing",
            "Twitter": "@bodyguard"
        }
    ]
}

The form-documents all look different and have no conflicting fields, so I would like to merge them, yielding a tabular, flat structure like this:

{ 'USER_SESSION_ID': 456, 'age': 21, 'gender': 'm', 'job': 'Student', ... 'Twitter': '@my_account' }
{ 'USER_SESSION_ID': 678, 'age': 46, 'gender': 'f', 'job': 'Bodyguard',  ... 'Twitter': '@bodyguard' }

Using Python, this is a total no-brainer, looking like this:

for session in sessions:          # Iterate all docs
    for form in session['forms']: # Iterate all children
        session.update(form)      # Integrate to parent doc
    del session['forms']          # Remove nested child

In MongoDb I find this quite hard to achieve. I am trying to use the aggregate pipeline, which I imagine should be suitable for this.

So far I helped myself by unwinding my datastructure, like this:

db.sessions.aggregate(
    {
        '$unwind': '$forms'
    },
    { 
        '$project': {
            'USER_SESSION_ID': true,
            'forms': true
        }
    },
    {
        '$group': {
            '_id': '$USER_SESSION_ID',
            'forms': <magic?!>
        }
    }
)

In the unwinding stage, I create a document with the parent's data for each child. This should be roughly equivalent to the double-for loop in my python code. However what I feel like I'm conceptually missing is the "Merge" accumulator upon grouping. In python, this is done with dict.update(), in underscore.js it would be _.extend(destination, *sources).

How do I achieve this within MongoDB?

2 Answers 2

1

Try the following which uses nested forEach() method calls of the find() cursor to iterate over the cursor result and get the object keys for the elements within the forms array using Object.keys():

db.sessions.find().forEach(function (doc){
    doc.forms.forEach(function (e){ 
        var keys = Object.keys(e); 
        keys.forEach(function(key){ doc[key] = e[key] });
    });
    delete doc.forms;
    db.sessions.save(doc);
});
Sign up to request clarification or add additional context in comments.

2 Comments

Thank you for your answer! I managed to fix my problem using the mapReduce function, which allows for further processing! I love how close to my python code you got though, I didn't know this was possible! Thanks a lot!
@cessor No worries, glad you found the solution.
1

I played around with the aggregate pipeline for ages until I gave the mapReduce command a try. This is what I came up with:

db.sessions.mapReduce(
    function () {
        var merged = {};
        this.forms.forEach(function (form) {
            for(var key in form) {
                merged[key] = form[key];
            }
        });
        emit(this.USER_SESSION_ID, merged);
    },
    function () {},
    {
         "out": {"inline": true}
    }
)

The mapping step combines the elements, since there is no single $merging operator available as an aggregation pipeline step. The empty reduce function is required. The out either writes to a different collection or just returns the result (inline, what I'm doing here).

It looks a lot like the method that chridam showed in his answer, but actually uses a projection. His version is much closer to the way that my python code works, but for what I'm trying to do a projection is fine and doesn't change the original set. Note that the python code does that, but not chaning the input collection is quite useful!

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.