0

So I mad the mistake and saved a lot of doduments twice because I messed up my document id. Because I did a Insert, i multiplied my documents everytime I saved them. So I want to delete all duplicates except the first one, that i wrote. Luckilly the documents have an implicit unique key (match._id) and I should be able to tell what the first one was, because I am using the object id.

The documents look like this:

{
  _id: "5e8e2d28ca6e660006f263e6"
  match : {
    _id:  2345
    ...
  }
  ...
}

So, right now I have a aggregation that tells me what elements are duplicated and stores them in a collection. There is for sure a more elegant way, but I am still learning.

[{$sort: {"$_id": 1},
{$group: {
  _id: "$match._id",
  duplicateIds: {$push: "$_id"},
  count: {$sum: 1}
}},
{$match: {
  count: { $gt: 1 }
}}, {$addFields: {
  deletableIds: { $slice: ["$duplicateIds", 1, 1000 ] }
}},
{$out: 'DeleteableIds'}]

Now I do not know how to proceed further, as it does not seem to have a "delete" operation in aggregations and I do not want to write those temp data to a db just so I can write a delete command with that, as I want to delete them in one go. Is there any other way to do this? I am still learning with mongodb and feel a little bit overwhelmed :/

2 Answers 2

1

Rather than doing all of those you can just pick first document in group for each _id: "$match._id" & make it as root document. Also, I don't think you need to do sorting in your case :

db.collection.aggregate([
  {
    $group: {
      _id: "$match._id",
      doc: {
        $first: "$$ROOT"
      }
    }
  },
  {
    $replaceRoot: {
      newRoot: "$doc"
    }
  }, {$out: 'DeleteableIds'}
])

Test : MongoDB-Playground

Sign up to request clarification or add additional context in comments.

1 Comment

That is a great solution, gives me a clean collection that I can swap out. Thanks a lot =)
1

I think you're on the right track, however, to delete the duplicates you've found you can use a bulk write on the collection.

So if we imagine you aggregation query saved the following in the the DeleteableIds collection

> db.DeleteableIds.insertMany([
... {deletableIds: [1,2,3,4]},
... {deletableIds: [103,35,12]},
... {deletableIds: [345,311,232,500]}
... ]);

We can now take them and write a bulk write command:

const bulkwrite = db.DeleteableIds.find().map(x => ({ deleteMany : { filter: { _id: { $in: x.deletableIds } } } }))

then we can execute that against the database.

> db.collection1.bulkWrite(bulkwrite)

this will then delete all the duplicates.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.