0

In our collection, there's structure like:

Object: //below is object metadata from mongo
    _id
    created_at
    lang
    source
    object: //this is real object data from our db
        id
        created_at
        object_class

I ran below query on this collection:

db.getCollection('foo').aggregate(
    [
    {
        $match: {
            lang: 'bar', 
            pushed_at:{
            $gte: new ISODate("2015-11-09T00:00:00.000Z"),
            $lt: new ISODate("2015-11-10T00:00:00.000Z")
            }
        }
    },
    {
        $group: {
            _id: "$object.id",
            occurences: {$sum: 1}
        }
    },
    {
        $match: {
            occurences: {$gt: 1}
        }
    }
])

Which returned: Alt text

It appears that we got duplicate entries in our collection. By duplicate I mean objects with same Object.object.id. I'd like to remove redundant occurences using results from agreggate function I used. Notice that I don't want to delete anything, just rendundant ones, so above aggregate returns occurences: 1.

How to do this, also using results from aggregation?

1
  • Can you edit your question to include a Minimal, Complete, and Verifiable example, i.e. some sample data just enough to produce the desired result and the expected output from that sample data? Commented Nov 10, 2015 at 11:25

1 Answer 1

1

I think you can try that in the shell :

db.foo.aggregate(
    [
    {
        $match: {
            lang: 'bar', 
            pushed_at:{
            $gte: new ISODate("2015-11-09T00:00:00.000Z"),
            $lt: new ISODate("2015-11-10T00:00:00.000Z")
            }
        }
    },
    {
        $group: {
            _id: "$object.id",
            occurences: {$sum: 1}
        }
    },
    {
        $match: {
            occurences: {$gt: 1}
        }
    }
]).result.forEach(function(x) {
    if(x.occurences > 1) {  
        for(i=0;i<x.occurences - 1;i++) {
            db.foo.remove({"object.id":x._id}, true);
        }
    }
}
);
Sign up to request clarification or add additional context in comments.

7 Comments

Thanks! That helps a lot!
Sorry for side question but I wonder if, in general, it is possible to call methods like find() on previous query results?
I'm not sure I understand, can you be more specfic ?
Let's say that above aggregate produces results that I want to filter using another find() method, and in the end do something with them, like: db.col.aggregate(foo).find(using foo).count().
But you can do à batch of if to match multiple conditions in your foreach
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.