0

I have mongodb collection whose structure is as follows :-

{
"_id" : "mongo",
"log" : [
    {
        "ts" : ISODate("2011-02-10T01:20:49Z"),
        "visitorId" : "25850661"
    },
    {
        "ts" : ISODate("2014-11-01T14:35:05Z"),
        "visitorId" : NumberLong(278571823)
    },
    {
        "ts" : ISODate("2014-11-01T14:37:56Z"),
        "visitorId" : NumberLong(0)
    },
    {
        "ts" : ISODate("2014-11-04T06:23:48Z"),
        "visitorId" : NumberLong(225200092)
    },
    {
        "ts" : ISODate("2014-11-04T06:25:44Z"),
        "visitorId" : NumberLong(225200092)
    }
],
"uts" : ISODate("2014-11-04T06:25:43.740Z")
}

"mongo" is a search term and "ts" indicates the timestamp when it was searched on website.

"uts" indicates the last time it was searched.

So search term "mongo" was searched 5 times on our website.

I need to get top 50 most searched items in past 3 months.

I am no expert in aggregation in mongodb, but i was trying something like this to atleast get data of past 3 months: -

db.collection.aggregate({$group:{_id:"$_id",count:{$sum:1}}},{$match:{"log.ts":{"$gte":new Date("2014-09-01")}}})

It gave me error :-

exception: sharded pipeline failed on shard DSink9: { errmsg: \"exception: aggregation result exceeds maximum document size (16MB)\", code: 16389

Can anyone please help me?

UPDATE

I was able to write some query. But it gives me syntax error.

db.collection.aggregate(
{$unwind:"$log"},
{$project:{log:"$log.ts"}},
{$match:{log:{"$gte" : new Date("2014-09-01"),"$lt" : new Date("2014-11-04")}}},
{$project:{_id:{val:{"$_id"}}}},
{$group:{_id:"$_id",sum:{$sum:1}}})

2 Answers 2

3

You are exceeding a maximum document size in a result, but generally that is an indication that you are "doing it wrong", particularly given your example term of searching for "mongo" in your stored data between two dates:

db.collection.aggregate([
   // Always match first, it reduces the workload and can use an index here only.
   { "$match": { 
       "_id": "mongo" 
       "log.ts": {
           "$gte": new Date("2014-09-01"), "$lt": new Date("2014-11-04")
       }
   }},

   // Unwind the array to de-normalize as documents
   { "$unwind": "$log" },

   // Get the count within the range, so match first to "filter"
   { "$match": { 
       "log.ts": {
           "$gte": new Date("2014-09-01"), "$lt": new Date("2014-11-04")
       }
   }},

   // Group the count on `_id`
   { "$group": {
       "_id": "$_id",
       "count": { "$sum": 1 }
   }}
]);
Sign up to request clarification or add additional context in comments.

5 Comments

Thanks. It worked like a charm. Could you please tell me what was incorrect in the query that I wrote in "Update" part of my question. It is really driving me crazy. Why is it giving me syntax error?
@Neil You are largely breaking this with assumptions on $project and what it is used for. Which I can thank you for reminding me as I'm writing an article on common mistakes right now. $project and $group have a specific function of "explicitly" specifying the fields that follow in the pipeline. If you don't name it, then it is not there for the next pipeline stage. Think of the output like unix pipe "|" from output to input.
Ok. Also how do i add sort and limit to your query as I need Top 50 values? Thanks for your help
@Neil I know it's tempting to ask more questions of the person who answered your question, but you really need to ask another question. That way you make your question clear, and you might even gain some rep for it. Hint though is that if you are looking for the limit on a single grouping value then not hard. On multiple values then hard. But both questions have been asked before.
Anyways, I would figure it out. If possible, do share the article about common mistakes with the community for the benefit of all.
0

Your aggregation result exceeds the max size of mongodb.You can use allowDiskUse option.This option prevent this.And in mongodb shell version 2.6 this will not throw an exception. look at this aggregrate.And you can optimize your query for decreasing the pipeline result.For this look at this question aggregation result

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.