1

Let's say we have records of following structure in database.

{
  "_id": 1234,
  "tags" : [ "t1", "t2", "t3" ]
}

Now, I want to check if database contains a record with any of the tags specified in array tagsArray which is [ "t3", "t4", "t5" ]

I know about $in operator but I not only want to know whether any of the records in database has any of the tag specified in tagsArray, I also want to know which tag of the record in database matches with any of the tags specified in tagsArray. (i.e. t3 in for the case of record mentioned above)

That is, I want to compare two arrays (one of the record and other given by me) and find out the common element.

I need to have this expression along with many expressions in the query so projection operators like $, $elematch etc won't be of much use. (Or is there a way it can be used without having to iterate over all records?)

I think I can use $where operator but I don't think that is the best way to do this. How can this problem be solved?

2
  • Are you saying that given the above sample document and your test list you would expect the resulting array to contain "t3"? And similarly if a document had both "t3" and "t4" then that would be the result for that document? Otherwise if you just want to know the documents that match then you actually want $in. At any rate $where would not be the best option for either can could not "filter" as you are possibly suggesting. Commented Jun 26, 2014 at 9:51
  • Yes, I want to have either an array containing common elements or the first common element. I don't want to get just the matching documents. That's why $in does not help much. Commented Jun 26, 2014 at 10:08

1 Answer 1

10

There are a few approaches to do what you want, it just depends on your version of MongoDB. Just submitting the shell responses. The content is basically JSON representation which is not hard to translate for DBObject entities in Java, or JavaScript to be executed on the server so that really does not change.

The first and the fastest approach is with MongoDB 2.6 and greater where you get the new set operations:

var test = [ "t3", "t4", "t5" ];

db.collection.aggregate([
   { "$match": { "tags": {"$in": test } }},
   { "$project": {
       "tagMatch": {
           "$setIntersection": [
               "$tags",
               test
           ]
       },
       "sizeMatch": {
           "$size": {
               "$setIntersection": [
                   "$tags",
                   test
               ]
           }
       }
   }},
   { "$match": { "sizeMatch": { "$gte": 1 } } },
   { "$project": { "tagMatch": 1 } }
])

The new operators there are $setIntersection that is doing the main work and also the $size operator which measures the array size and helps for the latter filtering. This ends up as a basic comparison of "sets" in order to find the items that intersect.

If you have an earlier version of MongoDB then this is still possible, but you need a few more stages and this might affect performance somewhat depending if you have large arrays:

var test = [ "t3", "t4", "t5" ];

db.collection.aggregate([
   { "$match": { "tags": {"$in": test } }},
   { "$project": {
      "tags": 1,
      "match": { "$const": test }
   }},
   { "$unwind": "$tags" },
   { "$unwind": "$match" },
   { "$project": {
       "tags": 1,
       "matched": { "$eq": [ "$tags", "$match" ] }
   }},
   { "$match": { "matched": true }},
   { "$group": {
       "_id": "$_id",
       "tagMatch": { "$push": "$tags" },
       "count": { "$sum": 1 }
   }}
   { "$match": { "count": { "$gte": 1 } }},
   { "$project": { "tagMatch": 1 }}
])

Or if all of that seems to involved or your arrays are large enough to make a performance difference then there is always mapReduce:

var test = [ "t3", "t4", "t5" ];

db.collection.mapReduce(
    function () {
      var intersection = this.tags.filter(function(x){
          return ( test.indexOf( x ) != -1 );
      });
      if ( intersection.length > 0 ) 
          emit ( this._id, intersection );
   },
   function(){},
   {
       "query": { "tags": { "$in": test } },
       "scope": { "test": test },
       "output": { "inline": 1 }
   }
)

Note that in all cases the $in operator still helps you to reduce the results even though it is not the full match. The other common element is checking the "size" of the intersection result to reduce the response.

All pretty easy to code up, convince the boss to switch to MongoDB 2.6 or greater if you are not already there for the best results.

Sign up to request clarification or add additional context in comments.

1 Comment

This looks great! I will test this soon and consider accepting the answer. Thanks for the fabulous help.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.