3

I have a contact list application using mongoDB to store the contacts and the java driver to interface with the database. Each contact is its own document with a number of fields, including GivenName, Surname, and MiddleInitial.

I recently added 150,000 additional contacts which slowed down performance. I added an index for Surname/GivenName/MiddleInitial (for sorting and searching by Surname) and one for GivenName (for searching by GivenName). This helped for the most part, except in a few cases. All of the searches are regular expressions anchored to the beginning of the string (e.g. ^Ale.*).

When searching by first name, queries that begin with q, u, x, or z perform noticeably slower than any other letter; searching by last name gets slower the closer to z the first letter is. I have not been able to find any other examples of this type of problem. Any help is appreciated.

EDIT:

Here are the indexes:

collection.ensureIndex(new BasicDBObject("Surname",1).append("GivenName",1).append("MiddleInitial",1));
collection.ensureIndex(new BasicDBObject("GivenName", 1));

and the queries:

BasicDBObject contactInfo = new BasicDBObject("GivenName", new BasicDBObject("$regex", "(?i)^al.*"); //GivenName may be Surname, al is just an example query

DBCursor cursor = collection.find(contactInfo).sort(new BasicDBObject("Surname",1).append("GivenName", 1).append("MiddleInitial", 1));

Explain results a-z on GivenName are here

Explain results a-z on GivenName without sort are here

5
  • 2
    Run a the query with the explain method and see if there is a difference when running with 'a' as a prefix and 'z' as a prefix. See docs.mongodb.org/manual/reference/method/cursor.explain for more info Commented Dec 17, 2013 at 15:00
  • I just ran one for a, n, and z on surnames. The only difference is in the nscanned and nscannedallplans fields (30, 97964, and 152633 respectively), and the milis field as well, but that is just the time it takes for the query so it should be different. Commented Dec 17, 2013 at 15:12
  • 2
    It might be useful to add more information about the queries and indexes being used Commented Dec 17, 2013 at 15:20
  • 1
    Did you try the same query without sorting? can you also add the explain() results? Commented Dec 17, 2013 at 15:58
  • Querying on GivenName without sorting gave similar results to querying on Surname. Progressively slower from a to z Commented Dec 17, 2013 at 16:16

1 Answer 1

2

You're doing a case insensitive regular expression search. This will almost certainly bypass any indexes you have defined. One option is to store your fields twice with one copy force to upper/lower case then do your regex query against that. A starts-with query can still use an index but not if you're ignoring the case like that.

Sign up to request clarification or add additional context in comments.

1 Comment

That fixed the speed problem. I'll find a workaround for the case sensitivity. Thank you

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.