How to calculate the count of regex query results?

Question

I have a large MongoDB collection (approx. 30M records) and each item has an array of unique numbers consisting of 8 digits. Most of the time that array has only 1 element (1 number). I want to find out how many records the collection holds with numbers starting with 4, for example, so I query:

{ "numbers.number": /^4.*/i }

However, the query takes too long, last time it took 20 minutes before I interrupted the execution. So I wonder if there's a way to optimize the query. numbers.number is indexed. I also tried this one:

{ "numbers.number": /^4[0-9]{7}/}

still takes too long. Here's an example of the document:

{ 
    "_id" : ObjectId("some_id"), 
    "created_at" : ISODate("2022-10-13T09:32:45.000+0000"), 
    "source" : {
        "created_at" : ISODate("2021-10-13T08:54:06.000+0000"), 
        "some_id" : NumberInt(234), 
        "another_id" : NumberInt(11)
    }, 
    "first_name" : "Test", 
    "last_name" : "Test", 
    "date_of_birth" : "1970-01-01", 
    "status" : "active", 
    "numbers" : [
        {
            "created_at" : ISODate("2022-11-13T09:32:45.000+0000"), 
            "number" : "40000005", 
            "_id" : ObjectId("some_id")
        }
    ]
}

Try to use $elemMatch

Valijon
– Valijon

2022-05-11 11:02:38 +00:00
Commented May 11, 2022 at 11:02 — Valijon
– Valijon, Commented May 11, 2022 at 11:02
@Valijon can't use it with objects, see the example

Gevorg Melkumyan
– Gevorg Melkumyan

2022-05-11 11:22:27 +00:00
Commented May 11, 2022 at 11:22 — Gevorg Melkumyan
– Gevorg Melkumyan, Commented May 11, 2022 at 11:22
numbers is an array, is it?

Valijon
– Valijon

2022-05-11 11:56:23 +00:00
Commented May 11, 2022 at 11:56 — Valijon
– Valijon, Commented May 11, 2022 at 11:56
yes, but it's an array of objects, not numbers

Gevorg Melkumyan
– Gevorg Melkumyan

2022-05-11 12:16:10 +00:00
Commented May 11, 2022 at 12:16 — Gevorg Melkumyan
– Gevorg Melkumyan, Commented May 11, 2022 at 12:16
Check this

Valijon
– Valijon

2022-05-11 14:54:04 +00:00
Commented May 11, 2022 at 14:54 — Valijon
– Valijon, Commented May 11, 2022 at 14:54

turivishal · Accepted Answer · 2022-05-11 11:34:12Z

2

The regular expression is costly for performance and speed even if it has an index or not, if you have data in the millions count,

This is a similar question, MongoDB, performance of query by regular expression on indexed fields

I am not sure, I have not compared and tested the performance. but try just ^ sign without .*,

{ "numbers.number": /^4/ }

As per the additional note in regex index use documentation of MongoDB,

Additionally, while /^a/, /^a.*/, and /^a.*$/ match equivalent strings, they have different performance characteristics. All of these expressions use an index if an appropriate index exists; however, /^a.*/, and /^a.*$/ are slower. /^a/ can stop scanning after matching the prefix.

The second option, I would suggest if you know the range of the number you could just use $gte and $lt operator to find the specific series by specifying numbers,

{ 
  "numbers.number": {
    "$gte": "40000000",
    "$lt": "50000000"
  }
}

Third, you can check multiple ranges by using $or operator,

{ 
  "$or": [
    {
      "numbers.number": {
        "$gte": "4000000",
        "$lt": "5000000"
      }
    },
    {
      "numbers.number": {
        "$gte": "40000000",
        "$lt": "50000000"
      }
    }
  ]
}

NOTE:

try to execute this query in MongoDB shell

always use count functions, if you just need counts of the documents

db.coll.find({query}).count()

db.coll.countDocuments({query})

edited May 11, 2022 at 11:34

answered May 11, 2022 at 11:24

turivishal

36.4k7 gold badges48 silver badges69 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Gevorg Melkumyan Over a year ago

Thanks, the first option worked like a charm, the other two didn't. Also, is there an efficient way to specify the length too? Let's say I want to count all records with numbers that start with 4 and have exactly 8 digits.

turivishal Over a year ago

There is no straight way, you have to use an aggregation expression operator condition using $expr, $filter and $strLenCP, is it one time process or functionality?

turivishal Over a year ago

You can use range condition { "numbers.number": { "$gte": "40000000", "$lt": "50000000" } }

Gevorg Melkumyan Over a year ago

It's a functionality, I use it frequently to generate new unique numbers. I tried the range, but it works too slow.

turivishal Over a year ago

There is no straight way, check this playground, aggregation query might work not sure.

Collectives™ on Stack Overflow

How to calculate the count of regex query results?

1 Answer 1

5 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

5 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related