0

I got the below illustrated JSON in my MongoDB. I got several hundred entries. The issue is that a few of them have multiple entries within the embedded array. In this illustration the entry_2 and the following entry_2 object of the array type are not allowed. I want to delete one of the two objects where the name of type array is of value "entry_2".

{   
   "id": null,
   "name": "",
   "array" [
       {
             "name": "entry_1"
       },
       {
             "name": "entry_2"

       },
       {
             "name": "entry_2"

       },
       {
             "name": "entry_3"

       }
   ]
}

Hence, my JSON should look like the following after the Query:

{   
   "id": null,
   "name": "",
   "array" [
       {
             "name": "entry_1"
       },
       {
             "name": "entry_2"

       },
       {
             "name": "entry_3"

       }
   ]
}

I tried to browse SO and read http://docs.mongodb.org/manual/tutorial/query-documents/#exact-match-on-the-embedded-document but I couldn't get a solution.


-- EDIT --

I have to use the option { allowDiskUse: true } and don't know how to implement it in the query. Furthermore, I tried to adjust the query to my certain use case, where I got the following structure:

{
    "_id": {
        "$oid": "556ccf6f59bbda5ea20a8884"
},
"id": 1159,
"description": "Cheese, goat, soft type",
"tags": [],
"manufacturer": "",
"group": "Dairy and Egg Products",
"portions": [
    {
        "unit": "oz",
        "grams": 28.35,
        "amount": 1
    }
],
"nutrients": [
    {
        "description": "Protein",
        "group": "Composition",
        "value": 18.52,
        "units": "g"
    },
    {
        "group": "Composition",
        "value": 21.08,
        "units": "g",
        "description": "Total lipid (fat)"
    },
    {
        "description": "Protein",
        "group": "Composition",
        "value": 18.52,
        "units": "g"
    }
    ]
}

Based on the answer below I tried:

var pipeline = [
    {
        "$unwind": "$nutrients"
    },
    {
       "$group": {
           "_id": "$_id",
           "id": { "$first": "$id" }
           "description": { "$first": "$description" },
           "tags" : { "$first": "$tags" },
           "manufacturer" : { "$first": "$manufacturer" },
           "group" : { "$first": "$group" },     
           "portions" : { "$first": "$portions" },
           "nutrients": {
               "$addToSet": "$nutrients"
           }        
       }
    }
],
options = { "allowDiskUse": true };
db.collection.aggregate(pipeline, options);

I get the error message: "unexpected String". I suppose it has something to do with the "_id" object an "tags" array.

1
  • 1
    You are getting the error because your pipeline is missing a comma , after the "id": { "$first": "$id" } expression in the $group operator expression. Commented Jun 5, 2015 at 10:45

1 Answer 1

2

Use $reduce to remove the duplicates in the array as follows:

db.collection.aggregate([
    { $addFields: {
        array: {
            $reduce: {
                input: "$array",
                initialValue: [],
                in: {
                    $cond: [
                        { $in: ["$$this.name", "$$value.name"] }, 
                        "$$value",
                        { $concatArrays: ["$$value", ["$$this"]] }
                    ]
                }
            }
        }
    } }
])

Playmongo


For older versions of MongoDB, use the following:

Try the $addToSet operator, an accumulator operator available only in the $group stage. This will add an array of all unique values that results from applying an expression to each document in a group of documents that share the same group by key:

db.collection.aggregate([
    { $unwind: "$array" },
    { $group: {
        _id: "$_id",
        array: { $addToSet: "$array" },
        "name": { "$first": "$name" },
        "id": { "$first": "$id" }
    } }
])

The output is the desired array with the object:

/* 0 */
{
    "result" : [ 
        {
            "_id" : ObjectId("5570a775d41ac325b8cb9a5f"),      
            "id": null,
            "array" : [ 
                {
                    "name" : "entry_3"
                }, 
                {
                    "name" : "entry_2"
                }, 
                {
                    "name" : "entry_1"
                }
            ],
            "name" : ""
        }
    ],
    "ok" : 1
}

-- EDIT --

To set allowDiskSpace to true, the aggregate() methods allows a second parameter for options like that. For example, with the above pipeline, you could do something like this:

var pipeline = [
    { $unwind: "$array" },
    { $group: {
        _id: "$_id",
        array: { $addToSet: "$array" },
        "name": { "$first": "$name" },
        "id": { "$first": "$id" }
    } }
    ],
    options = { "allowDiskUse": true };

db.collection.aggregate(pipeline, options);
Sign up to request clarification or add additional context in comments.

4 Comments

I have to set allowDiskSpace to true. But I'm not sure on how to do this in the above query.
@user1772306 I've updated my answer to include this.
@chirdam Thank you, I updated the question accordingly. I think I'm close to a solution. Thank you for all the help!
Does this delete the duplicate record? or just selected the first in duplicates?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.