2

I'm having performance problems when using the MongoDB Aggregation Framework through C#. An aggregation which works fast through Mongo shell takes forever when executed with C#.

Before trying to call the framework through C#, I executed the following aggregation through Mongo shell to check that everything works:

db.runCommand(
    {
        aggregate: "actions", 
        pipeline : 
        [
            { $match : { CustomerAppId : "f5357224-b1a8-4f1a-8ea2-a06a00ca597a", ActionName : "install"}}, 
            { $group : { _id : { CustomerAppId:"$CustomerAppId",ActionDate:"$ActionDate" }, count : { $sum : 1 } }}
        ]
    });

The script executed in < 500ms and returns the expected around 200 results (The CustomerAppId is defined as a string in the database. It's not possible to use GUIDs with aggregation framework.).

Then, I ported the same script to C#:

var pipeline = new BsonArray
        {
            new BsonDocument
                {
                    {
                        "$match", 
                        new BsonDocument
                            {
                                {"CustomerAppId", "f5357224-b1a8-4f1a-8ea2-a06a00ca597a"},
                                {"ActionName", "install"}
                            }
                    },
                    { "$group", 
                        new BsonDocument
                            {
                                { "_id", new BsonDocument
                                             {
                                                 {
                                                     "CustomerAppId","$CustomerAppId"
                                                 },
                                                 {
                                                     "ActionName","$ActionName"
                                                 }
                                             } 

                                },
                                {
                                    "Count", new BsonDocument
                                                 {
                                                     {
                                                         "$sum", 1
                                                     }
                                                 }
                                }
                            } 
                  }
            }
        };


var command = new CommandDocument
{
    { "aggregate", "actions" },
    { "pipeline", pipeline }
};

(Please let me know if there's an easier way to write the aggregation in C# :) )

Which I'm executing like this:

var result = db.RunCommand(command);

The problem is that it kills the server: The CPU and mem usage go way up. When I check db.currentOp(), I can see the aggregate operation but I eventually have to kill it using db.killOp(1281546):

"opid" : 1281546,
"active" : true,
"secs_running" : 294,
"op" : "query",
"ns" : "database.actions",
"query" : {
        "aggregate" : "actions",
        "pipeline" : [
                {
                        "$match" : {
                                "CustomerAppId" : "f5357224-b1a8-4f1a-8ea2-a06a00ca597a",
                                "ActionName" : "install"
                        },
                        "$group" : {
                                "_id" : {
                                        "CustomerAppId" : "$CustomerAppId",
                                        "ActionName" : "$ActionName"
                                },
                                "Count" : {
                                        "$sum" : 1
                                }
                        }
                }
        ]
},

To me the operation looks completely fine and similar to the script I run directly from mongo shell. It feels like running the aggregation through C# causes the MongoDB to miss the index and it's doing a table scan for all the ~6 million documents in the collection.

Any ideas?

Update: Logs

Thanks to cirrus' suggestion, I enabled the verbose logging and then used tail to get the queries. And they are different! So I think there is something wrong in my C# port. Any ideas on how to format the query correctly?

The query when executed through shell:

Mon Oct  8 15:00:13 [conn1] run command database.$cmd { aggregate: "actions", pipeline: [ { $match: { CustomerAppId: "f5357224-b1a8-4f1a-8ea2-a06a00ca597a", ActionName: "install" } }, { $group: { _id: { CustomerAppId: "$CustomerAppId", ActionDate: "$ActionDate" }, count: { $sum: 1.0 } } } ] }
Mon Oct  8 15:00:13 [conn1] command database.$cmd command: { aggregate: "actions", pipeline: [ { $match: { CustomerAppId: "f5357224-b1a8-4f1a-8ea2-a06a00ca597a", ActionName: "install" } }, { $group: { _id: { CustomerAppId: "$CustomerAppId", ActionDate: "$ActionDate" }, count: { $sum: 1.0 } } } ] } ntoreturn:1 keyUpdates:0 locks(micros) r:27944 reslen:12705 29ms

And the query when executed through C#:

Mon Oct  8 15:00:16 [conn8] run command database.$cmd { aggregate: "actions", pipeline: [ { $match: { CustomerAppId: "f5357224-b1a8-4f1a-8ea2-a06a00ca597a", ActionName: "install" }, $group: { _id: { CustomerAppId: "$CustomerAppId", ActionDate: "$ActionDate" }, Count: { $sum: 1 } } } ] }

Second line is missing, I suppose because the query doesn't finish.

And here are the logs again for easier comparison. Script is up, C# down:

Mon Oct  8 15:00:13 [conn1] run command database.$cmd { aggregate: "actions", pipeline: [ { $match: { CustomerAppId: "f5357224-b1a8-4f1a-8ea2-a06a00ca597a", ActionName: "install" } }, { $group: { _id: { CustomerAppId: "$CustomerAppId", ActionDate: "$ActionDate" }, count: { $sum: 1.0 } } } ] }
Mon Oct  8 15:00:16 [conn8] run command database.$cmd { aggregate: "actions", pipeline: [ { $match: { CustomerAppId: "f5357224-b1a8-4f1a-8ea2-a06a00ca597a", ActionName: "install" }, $group: { _id: { CustomerAppId: "$CustomerAppId", ActionDate: "$ActionDate" }, Count: { $sum: 1 } } } ] }
4
  • 2
    You know you've used ActionDate in the C# version as opposed to ActionName in the shell version right? Beyond that, turn logging on with "verbose=true" in your config file and tail.exe the log file. It will show you the actual query it's executing on the DB. Commented Oct 8, 2012 at 13:40
  • Actually no! I swear I triple checked that things match but I still managed to miss it. But unfortunately the execution is still slow. The index contains ActionName, CustomerAppId and ActionDate. Grouping from shell with either AppId&Date or AppId&ActionName is fast, but from the code they are both slow. Thanks for the tip on the verbose. I'll have to check that out. Commented Oct 8, 2012 at 14:48
  • Don't forget you can upvote helpful comments as well as answers ;) Commented Oct 8, 2012 at 17:15
  • Using Mongo 2.4 I can use the aggregation framework with binary represented guids Commented Dec 6, 2013 at 16:59

1 Answer 1

4

Turns out I was formatting the pipeline-object incorrectly. Both the $match and $group must be in their own BsonDocument-instances. The following code seems to produce the correct output:

var pipeline = new BsonArray
        {
            new BsonDocument
                {
                    {
                        "$match", 
                        new BsonDocument
                            {
                                {"CustomerAppId", "f5357224-b1a8-4f1a-8ea2-a06a00ca597a"},
                                {"ActionName", "install"}
                            }
                    }
            },
            new BsonDocument
                {
                    { "$group", 
                        new BsonDocument
                            {
                                { "_id", new BsonDocument
                                             {
                                                 {
                                                     "CustomerAppId","$CustomerAppId"
                                                 },
                                                 {
                                                     "ActionDate","$ActionDate"
                                                 }
                                             } 

                                },
                                {
                                    "Count", new BsonDocument
                                                 {
                                                     {
                                                         "$sum", 1
                                                     }
                                                 }
                                }
                            } 
                  }
                }
        };

I really hope there's a C# Linq provider for MongoDB Aggregation Framework in the pipeline :)

Sign up to request clarification or add additional context in comments.

3 Comments

Me too. I suspected verbose might help, because I've been there myself recently. The lack of aggregate support in the driver is beyond inconvenient. It's more fragile and means ugly parameter substitution. There was a ticket jira.mongodb.org/browse/CSHARP-383 raised for aggregate support with linq, but it got closed because it was possible with the method you used above. Personally I don't consider that satisfactory, and neither do you by the sound. If you read the comments I don't understand why it was closed. It needs to be re-opened IMHO.
I ended up writing a tutorial which shows examples of using the aggregation framework with C#. Hope people can copy&paste these and use them for building their own pipelines. mikaelkoskinen.net/post/…
@cirrus there is an open issue to support linq to aggregation framework see the issue here jira.mongodb.org/browse/…

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.