1

In mongodb, I have a collection of documents with an array of records that I want to group by similar tag preserving the natural order

    {
            "day": "2019-01-07",
            "records": [
                {
                    "tag": "ch",
                    "unixTime": ISODate("2019-01-07T09:06:56Z"),
                    "score": 1
                },
                {
                    "tag": "u",
                    "unixTime": ISODate("2019-01-07T09:07:06Z"),
                    "score": 0
                },
                {
                    "tag": "ou",
                    "unixTime": ISODate("2019-01-07T09:07:06Z"),
                    "score": 0
                },
                {
                    "tag": "u",
                    "unixTime": ISODate("2019-01-07T09:07:20Z"),
                    "score": 0
                },
                {
                    "tag": "u",
                    "unixTime": ISODate("2019-01-07T09:07:37Z"),
                    "score": 1
                }
         ]

I want to group (and aggregate) the records by similar sequence of tags and NOT simply by grouping unique tags

Desired output:

    {
            "day": "2019-01-07",
            "records": [
                {
                    "tag": "ch",
                    "unixTime": [ISODate("2019-01-07T09:06:56Z")],
                    "score": 1
                    "nbRecords": 1
                },
                {
                    "tag": "u",
                    "unixTime": [ISODate("2019-01-07T09:07:06Z")],
                    "score": 0,
                    "nbRecords":1
                },
                {
                    "tag": "ou",
                    "unixTime": [ISODate("2019-01-07T09:07:06Z")],
                    "score": 0
                },
                {
                    "tag": "u",
                    "unixTime: [ISODate("2019-01-07T09:07:20Z"),ISODate("2019-01-07T09:07:37Z")]
                    "score": 1
                    "nbRecords":2
                }
         ]

Groupby

It seems that '$groupby' aggregation operator in mongodb previously sort the array and group by the unique field

   db.coll.aggregate(
         [
           {"$unwind":"$records"},
           {"$group":
                   {
                       "_id":{ 
                           "tag":"$records.tag",
                           "day":"$day"
                        },
                       ...
                    }
            }
         ]
   )

Returns

{
            "day": "2019-01-07",
            "records": [
                {
                    "tag": "ch",
                    "unixTime": [ISODate("2019-01-07T09:06:56Z")],
                    "score": 1
                    "nbRecords": 1
                },
                {
                    "tag": "u",
                    "unixTime": [ISODate("2019-01-07T09:07:06Z"),ISODate("2019-01-07T09:07:20Z"),ISODate("2019-01-07T09:07:37Z")],
                    "score": 2,
                    "nbRecords":3
                },
                {
                    "tag": "ou",
                    "unixTime": [ISODate("2019-01-07T09:07:06Z")],
                    "score": 0
                },

         ]

Map/reduce

As I'm currently using pymongo driver, I implemented the solution back in python using itertools.groupby that as a generator performs the grouping respecting the natural order but I'm confronted to server timing out problem (cursor.NotFound Error) as an insane time processing.

Any idea of how to use directly the mapreduce function of mongo to perform the equivalent of the itertools.groupby() in python?

Help would be very appreciated: I'm using pymongo driver 3.8 and MongoDB 4.0

2 Answers 2

1

Ni! Run through the array of records adding a new integer index that increments whenever the groupby target changes, then use the mongo operation on that index. .~´

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks Ale, I will give it a try!
0

With the recommendation of @Ale and without any tips on the way to do that in MongoDb. I switch back to a python implementation solving the cursor.NotFound problem.

I imagine that I could be done inside Mongodb but this is working out

for r in db.coll.find():
        session = [

        ]
        for tag, time_score in itertools.groupby(r["records"], key=lambda x:x["tag"]):
            time_score = list(time_score)
            session.append({
                "tag": tag, 
                "start": time_score[0]["unixTime"], 
                "end": time_score[-1]["unixTime"], 
                "ca": sum([n["score"] for n in time_score]), 
                "nb_records": len(time_score) 
            })
        db.col.update(
                {"_id":r["_id"]}, 
                {
                    "$unset": {"records": ""},
                    "$set":{"sessions": session}
                })

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.