1

I have a mongodb collection with an array field, containing a list of strings. There may be repeats in those strings. for example:

doc1 = {a: ["p", "q", "r", "p", "r"]}
doc2 = {a: ["p", "q", "q"]}
doc3 = {a: ["p"]}
doc4 = {a: ["p", "r", "r"]}

I want to find all the documents that, given a string (say, "p"), finds all the documents that have the string at least two times in the array.

For example:

query("p") == [doc1]
query("q") == [doc2]
query("r") == [doc1, doc4]

Is there a way to do this directly in mongo? I know I can query for occurrence once, and then filter the results on my application, but I'd rather avoid that.

2 Answers 2

2

You could try something like below. This query returns the _id of the documents matching your query and also the count.

db.mycoll.aggregate([
    {$unwind:"$a"}, 
    {$group:{_id:{_id:"$_id", a:"$a"}, count:{$sum:1}}}, 
    {$match:{"_id.a":"r", count:{$gte:2}}}, 
    {$project:{_id:0, id:"$_id._id", count:1}}
])

Note that $match phase contains "p". You can substitute that with "q" or "r"

Sign up to request clarification or add additional context in comments.

3 Comments

I just tried this query with 'r' and am only getting one result.
Sorry, the group by was incorrect. I have fixed it now and edited my post.
A very different approach than my answer, but interesting. I wonder which is faster for large datasets.
1
var search = 'r';
docs.aggregate([
  {$match: { a : search } }, //step 1, filter to the arrays we care about for speed
  //could do a project here to trim fields depending on object size
  {$unwind: '$a'}, //unwind to create a separate row for each letter
  { $group: { _id: '$_id', total: { $sum: { $cond : [ { $eq: ['$a', search] }, 1, 0] } } } }, //the real work, explained below
  {$match : {total : {$gte: 2} } } //grab the summed items with at least 2
  {$project: {_id: 1} } //grab just the _id field
]  )

Notes:

I believe $elemMatch won't work as it always finds the first item in the array, not every item in the array.

The real work happens in the $group call, where the $sum is based on the condition of finding the element you're searching for in the array. This works because we've unwound them to be separate rows.

Enjoy!

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.