1

I have ~ 200,000 documents in a collection which look like this:

{
 "_id": "tdhABqSZPEZ2fFcEzOVCb-q8d",
 "user": "testuser",
 "content": "Test Content"
}

And I have an array with ~50,000 entries:

let arr = ["tree", "apple", "test", "orange", ...otherEntries] // ~ 50,000 entries

I want to get all documents where any element of the array is in the content value, non case-sensitive, so that the example document above would be returned because in the array is test and in the content of the document is Test.

This would work using $where and then using a JavaScript expression but this is not very fast. Is there a really fast way (< 1-2 seconds) of doing a query like this or do you have any idea on how to restructure the documents that I can perform a fast query like this?

3
  • why don't you use docs.mongodb.com/manual/text-search its creating the index. So highly recommended than normal find() Commented Aug 21, 2021 at 7:45
  • Because I want to input an array to check if any array element is in the text and I couldn't find anything like that in Text Search Commented Aug 21, 2021 at 7:49
  • Yeah, it is, but I don't know how I would do that. Can you answer this question with this solution so I can see if it works for me? Commented Aug 21, 2021 at 11:49

1 Answer 1

3
  • If you list(arr) was small => you could use make one index and use $in and filter (but its not small)

  • if you wanted case sensitive => you could make the list to a collection and $lookup with indexes (but you want case insensitive)

In you case that you have big list, and you want case insesitive

  • join the list into a big string(lets name it MyListString its a variable in your driver), separated with spaces for example ["hat" "tree"] to become "hat tree"

  • create a text index on content its very easy to do for example in Java i did mycoll.createIndex(Indexes.text("content")); see your driver documentation on how to create text indexes.

  • Do a find or an aggregation with match (MyListString is the above big string variable) (this does by default a case insensitive match)

    { "$match" { "$text" { "$search" MyListString} } }

Time was < 1 sec in my benchmark, test it, i think you will be fine.

Sign up to request clarification or add additional context in comments.

3 Comments

Thank you for that answer! That's what I've tried a few hours ago but unfortunately my problem now are two things: 1. In this large array of strings are strings with spaces. So ["abc", "def ghi", "jkl"] would become "abc def ghi jkl". So I would need to map the whole array so every string would become e. g. "\"def ghi\"" - I don't know how to do this in a fast way for such a big array. 2. I have 10 of these large arrays and I want to make only one query. Is that possible? Tried it with $facet but when using $match and $text in $facet, it throws an error since this is impossible I guess
But in fact this is the solution to my described problem
Making that big string in the driver(for example in Javascript) is very fast.Now if you have more arrays you can make 1 array with the contents of all arrays and 1 string again, it will be very fast also.I am not sure i understanded the problem. If you are stuck or query is slow maybe ask new question.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.