0

I somehow created duplicates of every single entry in my database. Currently, there are 176039 documents and counting, half are duplicates. Each document is structured like so

_id : 5b41d9ccf10fcf0014fe8917
originName : "Hartsfield Jackson Atlanta International Airport"
destinationName : "Antigua"
totalDuration : 337

Inside the MongoDB Compass Community App for Mac under the Aggregations tab, I was able to find duplicates using this pipeline

[
    {$group: {
        _id: {originName: "$originName", destinationName: "$destinationName"},
        count: {$sum: 1}}},
    {$match: {count: {"$gt": 1}}}
]

I'm not sure how to move forward and delete the duplicates at this point. I'm assuming it has something to do with $out.

Edit: Something I didn't notice until now is that the values for totalDuration on each double are actually different.

6
  • 1
    add {$project:{_id:0, "originName":"$_id.originName", "destinationName":"$_id.destinationName"}},{ $out : collectionname }. This will replace the documents in your current collection with documents from aggregation pipeline. If you need totalDuration in the collection then add that field in both group and project stage before running the pipeline. Commented Aug 7, 2018 at 22:11
  • This worked exactly as expected. Can you turn this into an answer so I can upvote it? Also, I made an edit to the question. The totalDuration values are actually different for some reason. If I totalDuration to $group then $match will not find anything. Commented Aug 8, 2018 at 4:03
  • Do you want to keep the totalDuration in your output ? If yes do you want both ? Commented Aug 8, 2018 at 10:49
  • Yes, but only one and the first one although it doesn’t matter which. Commented Aug 8, 2018 at 10:51
  • Add totalDuration:{$first:"$totalDuration"} in the group stage and include in the $project stage as totalDuration:1 Commented Aug 8, 2018 at 10:53

1 Answer 1

2

Add

{$project:{_id:0, "originName":"$_id.originName", "destinationName":"$_id.destinationName"}},
{ $out : collectionname } 

This will replace the documents in your current collection with documents from aggregation pipeline. If you need totalDuration in the collection then add that field in both group and project stage before running the pipeline

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.