2

I have a large MongoDB collection (approx. 30M records) and each item has an array of unique numbers consisting of 8 digits. Most of the time that array has only 1 element (1 number). I want to find out how many records the collection holds with numbers starting with 4, for example, so I query:

{ "numbers.number": /^4.*/i }

However, the query takes too long, last time it took 20 minutes before I interrupted the execution. So I wonder if there's a way to optimize the query. numbers.number is indexed. I also tried this one:

{ "numbers.number": /^4[0-9]{7}/}

still takes too long. Here's an example of the document:

{ 
    "_id" : ObjectId("some_id"), 
    "created_at" : ISODate("2022-10-13T09:32:45.000+0000"), 
    "source" : {
        "created_at" : ISODate("2021-10-13T08:54:06.000+0000"), 
        "some_id" : NumberInt(234), 
        "another_id" : NumberInt(11)
    }, 
    "first_name" : "Test", 
    "last_name" : "Test", 
    "date_of_birth" : "1970-01-01", 
    "status" : "active", 
    "numbers" : [
        {
            "created_at" : ISODate("2022-11-13T09:32:45.000+0000"), 
            "number" : "40000005", 
            "_id" : ObjectId("some_id")
        }
    ]
}
6
  • Try to use $elemMatch Commented May 11, 2022 at 11:02
  • @Valijon can't use it with objects, see the example Commented May 11, 2022 at 11:22
  • numbers is an array, is it? Commented May 11, 2022 at 11:56
  • yes, but it's an array of objects, not numbers Commented May 11, 2022 at 12:16
  • Check this Commented May 11, 2022 at 14:54

1 Answer 1

2

The regular expression is costly for performance and speed even if it has an index or not, if you have data in the millions count,

This is a similar question, MongoDB, performance of query by regular expression on indexed fields

I am not sure, I have not compared and tested the performance. but try just ^ sign without .*,

{ "numbers.number": /^4/ }

As per the additional note in regex index use documentation of MongoDB,

Additionally, while /^a/, /^a.*/, and /^a.*$/ match equivalent strings, they have different performance characteristics. All of these expressions use an index if an appropriate index exists; however, /^a.*/, and /^a.*$/ are slower. /^a/ can stop scanning after matching the prefix.


The second option, I would suggest if you know the range of the number you could just use $gte and $lt operator to find the specific series by specifying numbers,

{ 
  "numbers.number": {
    "$gte": "40000000",
    "$lt": "50000000"
  }
}

Third, you can check multiple ranges by using $or operator,

{ 
  "$or": [
    {
      "numbers.number": {
        "$gte": "4000000",
        "$lt": "5000000"
      }
    },
    {
      "numbers.number": {
        "$gte": "40000000",
        "$lt": "50000000"
      }
    }
  ]
}

NOTE:

  • try to execute this query in MongoDB shell
  • always use count functions, if you just need counts of the documents
    • db.coll.find({query}).count()
    • db.coll.countDocuments({query})
Sign up to request clarification or add additional context in comments.

5 Comments

Thanks, the first option worked like a charm, the other two didn't. Also, is there an efficient way to specify the length too? Let's say I want to count all records with numbers that start with 4 and have exactly 8 digits.
There is no straight way, you have to use an aggregation expression operator condition using $expr, $filter and $strLenCP, is it one time process or functionality?
You can use range condition { "numbers.number": { "$gte": "40000000", "$lt": "50000000" } }
It's a functionality, I use it frequently to generate new unique numbers. I tried the range, but it works too slow.
There is no straight way, check this playground, aggregation query might work not sure.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.