In mongodb, I have a collection of documents with an array of records that I want to group by similar tag preserving the natural order
{
"day": "2019-01-07",
"records": [
{
"tag": "ch",
"unixTime": ISODate("2019-01-07T09:06:56Z"),
"score": 1
},
{
"tag": "u",
"unixTime": ISODate("2019-01-07T09:07:06Z"),
"score": 0
},
{
"tag": "ou",
"unixTime": ISODate("2019-01-07T09:07:06Z"),
"score": 0
},
{
"tag": "u",
"unixTime": ISODate("2019-01-07T09:07:20Z"),
"score": 0
},
{
"tag": "u",
"unixTime": ISODate("2019-01-07T09:07:37Z"),
"score": 1
}
]
I want to group (and aggregate) the records by similar sequence of tags and NOT simply by grouping unique tags
Desired output:
{
"day": "2019-01-07",
"records": [
{
"tag": "ch",
"unixTime": [ISODate("2019-01-07T09:06:56Z")],
"score": 1
"nbRecords": 1
},
{
"tag": "u",
"unixTime": [ISODate("2019-01-07T09:07:06Z")],
"score": 0,
"nbRecords":1
},
{
"tag": "ou",
"unixTime": [ISODate("2019-01-07T09:07:06Z")],
"score": 0
},
{
"tag": "u",
"unixTime: [ISODate("2019-01-07T09:07:20Z"),ISODate("2019-01-07T09:07:37Z")]
"score": 1
"nbRecords":2
}
]
Groupby
It seems that '$groupby' aggregation operator in mongodb previously sort the array and group by the unique field
db.coll.aggregate(
[
{"$unwind":"$records"},
{"$group":
{
"_id":{
"tag":"$records.tag",
"day":"$day"
},
...
}
}
]
)
Returns
{
"day": "2019-01-07",
"records": [
{
"tag": "ch",
"unixTime": [ISODate("2019-01-07T09:06:56Z")],
"score": 1
"nbRecords": 1
},
{
"tag": "u",
"unixTime": [ISODate("2019-01-07T09:07:06Z"),ISODate("2019-01-07T09:07:20Z"),ISODate("2019-01-07T09:07:37Z")],
"score": 2,
"nbRecords":3
},
{
"tag": "ou",
"unixTime": [ISODate("2019-01-07T09:07:06Z")],
"score": 0
},
]
Map/reduce
As I'm currently using pymongo driver, I implemented the solution back in python using itertools.groupby that as a generator performs the grouping respecting the natural order but I'm confronted to server timing out problem (cursor.NotFound Error) as an insane time processing.
Any idea of how to use directly the mapreduce function of mongo
to perform the equivalent of the itertools.groupby() in python?
Help would be very appreciated: I'm using pymongo driver 3.8 and MongoDB 4.0