0

I'm fairly new to MongoDB queries and I'm struggling to get an idea if its possible and if yes how to solve this problem.

My DB objects from this collection consist of different fields and a list of objects. inside these objects there are regular string fields.

I now realised that there are duplicates (wrong logic in my code) that have to be deleted now. but I cant search and delete about 10'000 DB entries so I thought there must be a query for that.

For example: in this example, object one and two are duplicates because string_1 and string_2 are in both objects the same. So one of the two first objects has to be deleted

{
    "string" : "",
    "string" : "",
    "string" : "",

    "list of objects" : [
        {
            "string_1" : "2",
            "string_2" : "2",
            "string_3" : "1",
        },
        {
            "string_1" : "2",
            "string_2" : "2",
            "string_3" : "4",
        },
        {
            "string_1" : "3",
            "string_2" : "5",
            "string_3" : "3",
        },
    ]
}

Desired outcome would be: (it keeps the first off the duplicates (where string_1 and string_2 are equal between the objects))

{
        "string" : "",
        "string" : "",
        "string" : "",
    
        "list of objects" : [
            {
                "string_1" : "2",
                "string_2" : "2",
                "string_3" : "1",
            },
            {
                "string_1" : "3",
                "string_2" : "5",
                "string_3" : "3",
            },
        ]
    }

Any help is appreciated

5
  • 1
    So if there is an identical value in two items, one of them should be deleted? are the name of the fields important for this matter, or only the values? If the 3rd item had a value "2" on "string_3", should it be deleted as well? Commented Jul 28, 2022 at 14:04
  • Perhaps if you showed the desired updated document, the update logic would be a bit clearer. Commented Jul 28, 2022 at 17:08
  • @nimrodserok so if String 1 and string 2 ar equal on both two or more items, than all but one should be deleted. it is important that its string 1 and stirng 2. so not any random values Commented Jul 29, 2022 at 15:58
  • @rickhg12hs updated the question with an output example Commented Jul 29, 2022 at 15:58
  • 1
    So what is the condition to delete an item? It should have string_1 and string_2 identical to another item? or if any string, for example, string_3 is identical in two items, then one of them should be deleted? Commented Jul 29, 2022 at 16:56

1 Answer 1

1

One option is using $unwind and $group to create items with unique properties:

The current solution will merge items where string_1 and string_2 are identical to other items (not to each other) as these properties are creating the group _id (without checking string_3 for uniqueness). You can use the same logic on string_3 as well, if needed, just insert it into the first group _id:

db.collection.aggregate([
  {
    $unwind: {
      path: "$list of objects",
      includeArrayIndex: "index"
    }
  },
  {
    $group: {
      _id: {
        string_1: "$list of objects.string_1",
        string_2: "$list of objects.string_2"
      },
      string_3: {$first: "$list of objects.string_3"},
      string: {$first: "$string"},
      index: {$first: "$index"},
      origId: {$first: "$_id"}
    }
  },
  {$sort: {index: 1}},
  {$group: {
      _id: "$origId",
      "list of objects": {
        $push: {
          string_1: "$_id.string_1",
          string_2: "$_id.string_2",
          string_3: "$string_3"
        }
      },
      string: {$first: "$string"}
    }
  }
])

See how it works on the playground example

If you want to update your existing collection, add this at the end:

{$merge: {into: <your collection name>}} 

and replace with your actual collection name.

Sign up to request clarification or add additional context in comments.

3 Comments

Thank you so much. I played arround a little in the playground and could fit your solution to my exact db content. now i realised that i dont know enough about mongodb to also apply it to my db. so i used mongo.exe to open the shell, switched to the correct db and then pasted the Query (with getCollection('MyCollection') instead of collection). i ran it and got an output, but if i check the DB in robo3T nothing has changed :( do i forget something that i also have to do so the changes will be made?
This is a query to get data, not to update the db. If you want to update the db, you can add a $merge step in the end to update your collection.
The answer is updated accordingly.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.