I have a MovieRatings database with columns userId, movieId, movie-categoryId, reviewId, movieRating and reviewDate.
In my mapper I want to extract userId -> (movieId, movieRating)
And then in the reducer I want to group all movieId, movieRating pair by user.
Here is my attempt:
Map function:
var map = function() {
var values={movieId : this.movieId, movieRating : this.movieRating};
emit(this.userId, values);}
Reduce function:
var reduce = function(key,values) {
var ratings = [];
values.forEach(function(V){
var temp = {movieId : V.movieId, movieRating : V.movieRating};
Array.prototype.push.apply(ratings, temp);
});
return {userId : key, ratings : ratings };
}
Run MapReduce:
db.ratings.mapReduce(map, reduce, { out: "map_reduce_step1" })
Output: db.map_reduce_step1.find()
{ "_id" : 1, "value" : { "userId" : 1, "ratings" : [ ] } }
{ "_id" : 2, "value" : { "userId" : 2, "ratings" : [ ] } }
{ "_id" : 3, "value" : { "userId" : 3, "ratings" : [ ] } }
{ "_id" : 4, "value" : { "userId" : 4, "ratings" : [ ] } }
{ "_id" : 5, "value" : { "userId" : 5, "ratings" : [ ] } }
{ "_id" : 6, "value" : { "userId" : 6, "ratings" : [ ] } }
{ "_id" : 7, "value" : { "userId" : 7, "ratings" : [ ] } }
{ "_id" : 8, "value" : { "userId" : 8, "ratings" : [ ] } }
{ "_id" : 9, "value" : { "userId" : 9, "ratings" : [ ] } }
{ "_id" : 10, "value" : { "userId" : 10, "ratings" : [ ] } }
{ "_id" : 11, "value" : { "userId" : 11, "ratings" : [ ] } }
{ "_id" : 12, "value" : { "userId" : 12, "ratings" : [ ] } }
{ "_id" : 13, "value" : { "userId" : 13, "ratings" : [ ] } }
{ "_id" : 14, "value" : { "userId" : 14, "ratings" : [ ] } }
{ "_id" : 15, "value" : { "movieId" : 1, "movieRating" : 3 } }
{ "_id" : 16, "value" : { "userId" : 16, "ratings" : [ ] } }
I am not getting the expected output. In fact, this output makes no sense to me!
Here is the python equivalent of what I am trying to do in the reducer (just in case the purpose of reducer wasn't clear above) :
def reducer_ratings_by_user(self, user_id, itemRatings):
#Group (item, rating) pairs by userID
ratings = []
for movieID, rating in itemRatings:
ratings.append((movieID, rating))
yield user_id, ratings
Edit 1 @chridam
Here is an outline of what I really want to do here :
Movies.csv file looks like :
userId,movieId,movie-categoryId,reviewId,movieRating,reviewDate
1,1,1,1,5,7/12/2000
2,1,1,2,5,7/12/2000
3,1,1,3,5,7/12/2000
4,1,1,4,4,7/12/2000
5,1,1,5,4,7/12/2000
6,1,1,6,5,7/15/2000
1,2,1,7,4,7/25/2000
8,1,1,8,4,7/28/2000
9,1,1,9,3,8/3/2000
...
...
I import this into mongoDB :
mongoimport --db SomeName --collection ratings --type csv --headerline --file Movies.csv
Then I am trying to apply the map-reduce function as define above. After that I will export it back to a csv by doing somethig like :
mongoexport --db SomeName --collection map_reduce_step1 --csv --out movie_ratings_out.csv --fields ...
This movie_ratings_out.csv file should be like :
userId, movieId1, rating1, movieId2, rating2 ,...
1,1,5,2,4
...
...
So each row contains all the (movie,rating) pair for every user.
Edit 2
Sample :
db.ratings.find().pretty()
{
"_id" : ObjectId("57f4a0dd9cb74fc4d344a40f"),
"userId" : 4,
"movieId" : 1,
"movie-categoryId" : 1,
"reviewId" : 4,
"movieRating" : 4,
"reviewDate" : "7/12/2000"
}
{
"_id" : ObjectId("57f4a0dd9cb74fc4d344a410"),
"userId" : 5,
"movieId" : 1,
"movie-categoryId" : 1,
"reviewId" : 5,
"movieRating" : 4,
"reviewDate" : "7/12/2000"
}
{
"_id" : ObjectId("57f4a0dd9cb74fc4d344a411"),
"userId" : 4,
"movieId" : 2,
"movie-categoryId" : 1,
"reviewId" : 6,
"movieRating" : 5,
"reviewDate" : "7/15/2000"
}
{
"_id" : ObjectId("57f4a0dd9cb74fc4d344a412"),
"userId" : 4,
"movieId" : 3,
"movie-categoryId" : 1,
"reviewId" : 2,
"movieRating" : 5,
"reviewDate" : "7/12/2000"
}
...
Then after MapReduce expected output json is :
{
"_id" : ....,
"userId" : 4,
"movieList" : [ {
"movieId" : 2
"movieRating" : 5
},
{
"movieId" : 1
"movieRating" : 4
}
...
]
}
{
"_id" : ....,
"userId" : 5,
"movieList" : ...
}
...
db.ratings.find()pick perhaps 5 documents to make the sample and show us your expected JSON output of the aggregation operation from the sample. Otherwise it's a futile effort to try reproduce the problem with the info above. Can you update your question with the sample documents and expected JSON output?