Delete all but one duplicate from a mongo db

Question

So I mad the mistake and saved a lot of doduments twice because I messed up my document id. Because I did a Insert, i multiplied my documents everytime I saved them. So I want to delete all duplicates except the first one, that i wrote. Luckilly the documents have an implicit unique key (match._id) and I should be able to tell what the first one was, because I am using the object id.

The documents look like this:

{
  _id: "5e8e2d28ca6e660006f263e6"
  match : {
    _id:  2345
    ...
  }
  ...
}

So, right now I have a aggregation that tells me what elements are duplicated and stores them in a collection. There is for sure a more elegant way, but I am still learning.

[{$sort: {"$_id": 1},
{$group: {
  _id: "$match._id",
  duplicateIds: {$push: "$_id"},
  count: {$sum: 1}
}},
{$match: {
  count: { $gt: 1 }
}}, {$addFields: {
  deletableIds: { $slice: ["$duplicateIds", 1, 1000 ] }
}},
{$out: 'DeleteableIds'}]

Now I do not know how to proceed further, as it does not seem to have a "delete" operation in aggregations and I do not want to write those temp data to a db just so I can write a delete command with that, as I want to delete them in one go. Is there any other way to do this? I am still learning with mongodb and feel a little bit overwhelmed :/

whoami - fakeFaceTrueSoul · Accepted Answer · 2020-04-13 21:23:19Z

1

Rather than doing all of those you can just pick first document in group for each _id: "$match._id" & make it as root document. Also, I don't think you need to do sorting in your case :

db.collection.aggregate([
  {
    $group: {
      _id: "$match._id",
      doc: {
        $first: "$$ROOT"
      }
    }
  },
  {
    $replaceRoot: {
      newRoot: "$doc"
    }
  }, {$out: 'DeleteableIds'}
])

Test : MongoDB-Playground

answered Apr 13, 2020 at 21:23

whoami - fakeFaceTrueSoul

18.1k6 gold badges36 silver badges51 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

modmoto Over a year ago

That is a great solution, gives me a clean collection that I can swap out. Thanks a lot =)

Kevin Smith · Accepted Answer · 2020-04-13 21:28:40Z

1

I think you're on the right track, however, to delete the duplicates you've found you can use a bulk write on the collection.

So if we imagine you aggregation query saved the following in the the DeleteableIds collection

> db.DeleteableIds.insertMany([
... {deletableIds: [1,2,3,4]},
... {deletableIds: [103,35,12]},
... {deletableIds: [345,311,232,500]}
... ]);

We can now take them and write a bulk write command:

const bulkwrite = db.DeleteableIds.find().map(x => ({ deleteMany : { filter: { _id: { $in: x.deletableIds } } } }))

then we can execute that against the database.

> db.collection1.bulkWrite(bulkwrite)

this will then delete all the duplicates.

answered Apr 13, 2020 at 21:28

Kevin Smith

14.5k5 gold badges63 silver badges88 bronze badges

Collectives™ on Stack Overflow

Delete all but one duplicate from a mongo db

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related