62

I want to find an account by name (in a MongoDB collection of 50K accounts)

In the usual way: we find with string

db.accounts.find({ name: 'Jon Skeet' })  // indexes help improve performance!

How about with regular expression? Is it an expensive operation?

db.accounts.find( { name: /Jon Skeet/ }) // worry! how indexes work with regex?

Edit:

According to WiredPrairie:
MongoDB use prefix of RegEx to lookup indexes (ex: /^prefix.*/):

db.accounts.find( { name: /^Jon Skeet/ })  // indexes will help!'

MongoDB $regex

7
  • 7
    @dirkk, I want to get more experiences and explanations. I also want to share the question too. Commented Jul 6, 2013 at 10:20
  • 3
    For regex to use an index, it must use an anchor as shown in the docs: docs.mongodb.org/manual/reference/operator/regex Commented Jul 6, 2013 at 11:19
  • possible duplicate of How to query mongodb with "like"? Commented Jul 6, 2013 at 11:21
  • There are many other very similar questions already answered on StackOverflow. Commented Jul 6, 2013 at 11:21
  • 1
    @WiredPrairie I want to focus on performance not about how to do query. Commented Jul 6, 2013 at 12:29

2 Answers 2

63

Actually according to the documentation,

If an index exists for the field, then MongoDB matches the regular expression against the values in the index, which can be faster than a collection scan. Further optimization can occur if the regular expression is a “prefix expression”, which means that all potential matches start with the same string. This allows MongoDB to construct a “range” from that prefix and only match against those values from the index that fall within that range.

http://docs.mongodb.org/manual/reference/operator/query/regex/#index-use

In other words:

For /Jon Skeet/ regex ,mongo will full scan the keys in the index then will fetch the matched documents, which can be faster than collection scan.

For /^Jon Skeet/ regex ,mongo will scan only the range that start with the regex in the index, which will be faster.

Sign up to request clarification or add additional context in comments.

2 Comments

regex works fine if there is an immediate match (ie: matching the letter a). But if I match a full word results take much longer (ie: angular). This is across 6M documents, is there anyway to speed these queries up? They are taking anywhere from 19-30 seconds for 8+ characters but come back immediately with 1-2 characters.
@chovy, I believe MongoDB is not the best tool for searching string occurrences in the middle of text - I suggest to look at ElasticSearch or any other full-text search engines.
16

In case anyone still has an issue with search performance, there is a way to optimize regex search even if it searches for a word in a sentence (not necessarily at the beginning ^ or the end $ of the string).

The field should have a text index

db.someCollection.createIndex({ someField: "text" })

and the queries on should use regex only after performing a plain search first

db.someCollection.find({ $and: 
  [
    { $text: { $search: "someWord" }}, 
    { someField: { $elemMatch: {$regex: /test/ig, $regex: /other/ig}}}
  ]
})

This ensures that the regex will run only for the results of the initial, plain search, which should be quite fast thanks to the index on this field. It might have a huge impact on search performance, depending on how large the collection is.

4 Comments

Thanks for the input. Still, I have to handle two search criteria. The whole word and then a part of the word.
This doesn't really work if you're not searching for full words. "some" will return nothing if you search by text index.
any updates on this?
for anyone unable to understand logic behind it: medium.com/statuscode/…

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.