Mongo DB Version 3.4.6
I have a collection with a document structure which resembles the following:
{
organization: "ABC123",
tags: ["MARTHA WASHINGTON", "+15552082000"],
updatedAt : ISODate("2020-10-09T17:19:44.861Z"),
createdAt : ISODate("2020-01-14T19:46:15.957Z"),
}
I need to be able to query by organization and a regex "starts with" on the tags array, and optionally sort by updatedAt or createdAt. To accomplish this, I created the following index:
{
"organization" : 1,
"tags" : 1,
"createdAt" : -1
}
This is a multikey compound index which based on my understanding of Mongo should allow me to cover the query in all cases. If I execute a query like:
db.getCollection('data').find({"organization": "ABC123", "search": /^MARTHA WASHINGTO/})
The query is covered by the index - I see a single FETCH/IXSCAN stage.
Likewise, if I remove the regex query and add a sort - the query is perfectly covered.
db.getCollection('data').find({"organization": "ABC123", "search": "MARTHA WASHINGTON"}).sort({"createdAt":-1})
However, if I combine the regex and sort options, suddenly I see an extra SORT stage in my query. Example query:
db.getCollection('data').find({"organization": "ABC123", "search": /^MARTHA WASHINGTO/}).sort({"createdAt":-1})
Here is the winning plan output from the explain:
"winningPlan" : {
"stage" : "SORT",
"sortPattern" : {
"createdAt" : -1.0
},
"inputStage" : {
"stage" : "SORT_KEY_GENERATOR",
"inputStage" : {
"stage" : "FETCH",
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"organization" : 1,
"tags" : 1,
"createdAt" : -1
},
"indexName" : "tag matches by organization",
"isMultiKey" : true,
"multiKeyPaths" : {
"organization" : [],
"search" : [
"search"
],
"createdAt" : []
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"organization" : [
"[\"ABC123\", \"ABC123\"]"
],
"tags" : [
"[\"MARTHA WASHINGTON\", \"MARTHA WASHINGTOO\")",
"[/^MARTHA WASHINGTON/, /^MARTHA WASHINGTON/]"
],
"createdAt" : [
"[MaxKey, MinKey]"
]
}
}
}
}
},
I am stumped about why this combination of queries is not being covered by the index. My understanding is that the extra sort stage at the beginning will result in slow performance for large collections. Can anyone provide some guidance? Is there some limitation that I've missed?
Update: winning plan when the regex query is removed
"winningPlan" : {
"stage" : "FETCH",
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"organization" : 1,
"search" : 1,
"createdAt" : -1
},
"indexName" : "tag matches by organization",
"isMultiKey" : true,
"multiKeyPaths" : {
"organization" : [],
"search" : [
"search"
],
"createdAt" : []
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"organization" : [
"[\"ABC123\", \"ABC123\"]"
],
"tags" : [
"[\"MARTHA WASHINGTON\", \"MARTHA WASHINGTON\"]"
],
"createdAt" : [
"[MaxKey, MinKey]"
]
}
}
},