29

In the following example, "Algorithms in C++" is present twice.

The $unset modifier can remove a particular field but how to remove an entry from a field?

{
  "_id" : ObjectId("4f6cd3c47156522f4f45b26f"), 
  "favorites" : {
    "books" : [
      "Algorithms in C++",    
      "The Art of Computer Programming", 
      "Graph Theory",      
      "Algorithms in C++"
    ]
  }, 
  "name" : "robert"
}

5 Answers 5

35

As of MongoDB 2.2 you can use the aggregation framework with an $unwind, $group and $project stage to achieve this:

db.users.aggregate([{$unwind: '$favorites.books'},
                    {$group: {_id: '$_id',
                              books: {$addToSet: '$favorites.books'},
                              name: {$first: '$name'}}},
                    {$project: {'favorites.books': '$books', name: '$name'}}
                   ])

Note the need for the $project to rename the favorites field, since $group aggregate fields cannot be nested.

Sign up to request clarification or add additional context in comments.

6 Comments

That is the right solution if you need to pipe more operators from the aggregation framework (to to statistics for example). Thank you Kynan !
in $group stage why are you using name: {$first: '$name'}?
@Towhid Because each unwound entry has the same name, so you can take any in the $group stage, so I'm just taking the first.
the problem with this is that unwind can generate really big amount of documents in the pipeline (i just ran into a case that unwind generated 1 million documents), and the group stage memory is limited to 100MB by default, and yea, you could increase the available memory, but it is not always possible, neither desirable.
@SalvadorJuanMartinez in this case you can always go to a full blown map reduce
|
22

The easiest solution is to use setUnion (Mongo 2.6+):

db.users.aggregate([
    {'$addFields': {'favorites.books': {'$setUnion': ['$favorites.books', []]}}}
])

Another (more lengthy) version that is based on the idea from @kynan's answer, but preserves all the other fields without explicitly specifying them (Mongo 3.4+):

> db.users.aggregate([
    {'$unwind': {
        'path': '$favorites.books',
        // output the document even if its list of books is empty
        'preserveNullAndEmptyArrays': true
    }},
    {'$group': {
        '_id': '$_id',
        'books': {'$addToSet': '$favorites.books'},
        // arbitrary name that doesn't exist on any document
        '_other_fields': {'$first': '$$ROOT'},
    }},
    {
      // the field, in the resulting document, has the value from the last document merged for the field. (c) docs
      // so the new deduped array value will be used
      '$replaceRoot': {'newRoot': {'$mergeObjects': ['$_other_fields', "$$ROOT"]}}
    },
    // this stage wouldn't be necessary if the field wasn't nested
    {'$addFields': {'favorites.books': '$books'}},
    {'$project': {'_other_fields': 0, 'books': 0}}
])

{ "_id" : ObjectId("4f6cd3c47156522f4f45b26f"), "name" : "robert", "favorites" : 
{ "books" : [ "The Art of Computer Programmning", "Graph Theory", "Algorithms in C++" ] } }    

2 Comments

I'm a Mongo newbie. I found that I still need to add this aggregate to a updateMany() to actually update the records in the database. Is that the case?
@milesmeow it's been 5 years, so I don't remember exactly, but I think that the command above should be enough, as it includes addFields. Maybe something changed in the API recently.
3

What you have to do is use map reduce to detect and count duplicate tags .. then use $set to replace the entire books based on { "_id" : ObjectId("4f6cd3c47156522f4f45b26f"),

This has been discussed sevel times here .. please seee

Removing duplicate records using MapReduce

Fast way to find duplicates on indexed column in mongodb

http://csanz.posterous.com/look-for-duplicates-using-mongodb-mapreduce

http://www.mongodb.org/display/DOCS/MapReduce

How to remove duplicate record in MongoDB by MapReduce?

1 Comment

Don't post just links, one of them is now broken :(
2
function unique(arr) {
    var hash = {}, result = [];
    for (var i = 0, l = arr.length; i < l; ++i) {
        if (!hash.hasOwnProperty(arr[i])) {
            hash[arr[i]] = true;
            result.push(arr[i]);
        }
    }
    return result;
}

db.collection.find({}).forEach(function (doc) {
    db.collection.update({ _id: doc._id }, { $set: { "favorites.books": unique(doc.favorites.books) } });
})

1 Comment

By bringing the logic out of MongoDB (and losing their native optimizations) this is almost guaranteed to be slower. While it may be useful to some (so I won't downvote), I'm sure it's unnecessarily inefficient and complex for many.
1

Starting in Mongo 4.4, the $function aggregation operator allows applying a custom javascript function to implement behaviour not supported by the MongoDB Query Language.

For instance, in order to remove duplicates from an array:

// {
//   "favorites" : { "books" : [
//     "Algorithms in C++",
//     "The Art of Computer Programming",
//     "Graph Theory",
//     "Algorithms in C++"
//   ]},
//   "name" : "robert"
// }
db.collection.aggregate(
  { $set:
    { "favorites.books":
      { $function: {
          body: function(books) { return books.filter((v, i, a) => a.indexOf(v) === i) },
          args: ["$favorites.books"],
          lang: "js"
      }}
    }
  }
)
// {
//   "favorites" : { "books" : [
//     "Algorithms in C++",
//     "The Art of Computer Programming",
//     "Graph Theory"
//   ]},
//   "name" : "robert"
// }

This has the advantages of:

  • keeping the original order of the array (if that's not a requirement, then prefer @Dennis Golomazov's $setUnion answer)
  • being more efficient than a combination of expensive $unwind and $group stages.

$function takes 3 parameters:

  • body, which is the function to apply, whose parameter is the array to modify.
  • args, which contains the fields from the record that the body function takes as parameter. In our case "$favorites.books".
  • lang, which is the language in which the body function is written. Only js is currently available.

2 Comments

It is worth noting for anyone using MongoDB Atlas free tier that this command is not available on that tier. Still +1'd for the solution.
This has been deprecated

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.