19

Given a document

{_id:110000, groupings:{A:'AV',B:'BV',C:'CV',D:'DV'},coin:{old:10,new:12}}

My specs call for the specification of attributes for mapping and aggregation at run time, as the groupings the user is interested in are not known up front, but specified by the user at runtime.

For example, one user would specify [A,B] which will cause mapping emissions of

emit( {A:this.groupings.A,B:this.groupings.B},this.coin )

while another would want to specify [A,C] which will cause mapping emissions of

emit( {A:this.groupings.A,C:this.groupings.C},this.coin )

B/c the mapper and reducer functions execute server side, and don't have access to client variables, I haven't been able to come up with a way to use a variable map key in the mapper function.

If I could reference a list of things to group by from the scope of the execution of the map function, this is all very straightforward. However, b/c the mapping function ends up getting these from a different scope, I don't know how to do this, or if it's even possible.

Before I start trying to dynamically build java script to execute through the driver, does anyone have a better suggestion? Maybe a 'group' function will handle this scenario better?

2 Answers 2

41

As pointed out by @Dave Griffith, you can use the scope parameter of the mapReduce function.

I struggled a bit to figure out how to properly pass it to the function because, as pointed out by others, the documentation is not very detailed. Finally, I realised that mapReduce is expecting 3 params:

  • map function
  • reduce function
  • object with one or more of the params defined in the doc

Eventually, I arrived at the following code in Javascript:

// I define a variable external to my map and to my reduce functions
var KEYS = {STATS: "stats"};

function m() {
    // I use my global variable inside the map function
    emit(KEYS.STATS, 1);
}

function r(key, values) {
    // I use a helper function
    return sumValues(values);
}

// Helper function in the global scope
function sumValues(values) {
    var result = 0;
    values.forEach(function(value) {
        result += value;
    });
    return result;
}

db.something.mapReduce(
    m,
    r,
    {
         out: {inline: 1},
         // I use the scope param to pass in my variables and functions
         scope: {
             KEYS: KEYS,
             sumValues: sumValues // of course, you can pass function objects too
         }
    }
);
Sign up to request clarification or add additional context in comments.

4 Comments

Yes, you can pass in functions, but you have to define them in line (not just the var reference). e.g: scope : {keys:['a','b','c'],sumValues:function(a,b,c){return a+b+d;}}
I don't think so. I have written and used code where some helper functions are called from my map or my reduce. They are defined as standard funtions and I pass them to the scope object as in my sample above.
Sharded setup? Nope. I only use a single Mongo server.
@gbegley Is there anything wrong with writing each key and value seperate in the scope instead of passing a dictionary?
17

You can pass global, read-only data into map-reduce functions using the "scope" parameter on the map-reduce command. It's not very well documented, I'm afraid.

4 Comments

Thanks, I think that is what I need. The only example I could find that even mentions scope is this one: github.com/mongodb/mongo/blob/master/jstests/mr5.js, and in this example, I didn't see where either M or R function used the passed in 'scope'. If you can point me at an example or using the passed in scope in a map/reduce/finalize (any) I will be very greatful.
That was the only example I was able to find as well. A couple of days ago I ran some tests to see if it actually worked, and if so how. It did work (at least in the Java driver, but not the Scala one), and was what I needed to pass in data from an SQL query into a map-reduce, but sadly that code is proprietary
Maybe a bit old, but I found another example of using scope: stackoverflow.com/questions/21522927/… I too am looking for a way to access data from other collections during a MR, and I think I'll need to pass those collections in the scope variable.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.