6

We've recently decided to revisit some of our MongoDB indexes and came across a peculiar result when using a compound index which contains a multikey part.

It's important to note that we're using v2.4.5

TLDR: When using a compound index with multikey part, the bounds of a non-multikey field used for range restriction are dropped.

I'll explain the problem with an example:

Create some data

db.demo.insert(
[{ "foo" : 1, "attr" : [  {  "name" : "a" },  {  "name" : "b" },  {  "name" : "c" } ]},
 { "foo" : 2, "attr" : [  {  "name" : "b" },  {  "name" : "c" },  {  "name" : "d" } ]},
 { "foo" : 3, "attr" : [  {  "name" : "c" },  {  "name" : "d" },  {  "name" : "e" } ]},
 { "foo" : 4, "attr" : [  {  "name" : "d" },  {  "name" : "e" },  {  "name" : "f" } ]}])

Index

db.demo.ensureIndex({'attr.name': 1, 'foo': 1})

Query & Explain

Query on 'attr.name' but constrain the range of the non-multikey field 'foo':

db.demo.find({foo: {$lt:3, $gt: 1}, 'attr.name': 'c'}).hint('attr.name_1_foo_1').explain()
{
    "cursor" : "BtreeCursor attr.name_1_foo_1",
    "isMultiKey" : true,
    "n" : 1,
    "nscannedObjects" : 2,
    "nscanned" : 2,
    "nscannedObjectsAllPlans" : 2,
    "nscannedAllPlans" : 2,
    "scanAndOrder" : false,
    "indexOnly" : false,
    "nYields" : 0,
    "nChunkSkips" : 0,
    "millis" : 0,
    "indexBounds" : {
        "attr.name" : [
            [
                "c",
                "c"
            ]
        ],
        "foo" : [
            [
                -1.7976931348623157e+308,
                3
            ]
        ]
    }
}

As you can see, the range of 'foo' is not as defined in the query, one end is completely ignored which results in nscanned being larger than it should.

Changing the order of the range operands will alter the dropped end:

db.demo.find({foo: {$gt: 1, $lt:3}, 'attr.name': 'c'}).hint('attr.name_1_foo_1').explain()
{
    "cursor" : "BtreeCursor attr.name_1_foo_1",
    "isMultiKey" : true,
    "n" : 1,
    "nscannedObjects" : 2,
    "nscanned" : 2,
    "nscannedObjectsAllPlans" : 2,
    "nscannedAllPlans" : 2,
    "scanAndOrder" : false,
    "indexOnly" : false,
    "nYields" : 0,
    "nChunkSkips" : 0,
    "millis" : 0,
    "indexBounds" : {
        "attr.name" : [
            [
                "c",
                "c"
            ]
        ],
        "foo" : [
            [
                1,
                1.7976931348623157e+308
            ]
        ]
    }
}

We're either missing out on some multikey index basics, or we're facing a bug.

We've gone through similar topics, including:

Unfortunately these posts address a different use-case where a range is set on the multikeyed value.

Other things we've tried to do:

  • Change the compound index ordering, starting with the non-multikey field.

  • Put the 'foo' value inside each of the subdocuments in the 'attr' array, index by ('attr.name', 'attr.foo') and do an $elemMatch on 'attr' with a range constraint on 'foo'.

  • Use an $and operator when defining the range:

    db.demo.find({'attr.name': 'c', $and: [{num: {$lt: 3}}, {num: {$gt: 1}}]})
    
  • Use MongoDB v2.5.4

None of the above had any effect (v2.5.4 made things worse by dumping both ends of the range completely).

Any kind of help would be highly appreciated!

Many Thanks,

Roi

2 Answers 2

2

With compound indexes where one of the indexed fields is an array, MongoDB will only use either a lower or upper bound for the range query to ensure correct matches are returned. See SERVER-958 for an example where constraining to both upper and lower index bounds would not find the expected document.

If your range query is on the array field you can potentially use the $elemMatch operator to optimise your query within the expected index bounds. As at MongoDB 2.4, the $elemMatch operator does not work on non-array fields so unfortunately this doesn't help your use case. You can watch/upvote SERVER-6050: Consider allowing $elemMatch applied to non arrays in the MongoDB issue tracker.

There is also an open issue SERVER-7959: Potentially unexpected scans with compound indexes when some fields are multikey describing this behaviour.

Sign up to request clarification or add additional context in comments.

9 Comments

Did you use different datSet? How come n:0?
@Stennie, It was late at night when I noticed your answer so I didn't notice the 'n:0' result which was mentioned by Asya as well. When trying this optimization on my hefty data set (as well as the demo), the bounds reported seem OK but the returned result is always empty. Any idea how to enjoy both the correct bounds and the correct results?
@AsyaKamsky: should be the same data set but I've mixed up something when testing the answer; will re-test.
@RoiTal: Will recheck .. thought $elemMatch should do the trick but it's late for me as well :). Feel free to unaccept while I work on a corrected answer.
@Stennie Finally, this issue seems to be fixed! See jira.mongodb.org/browse/SERVER-15086 The problem was that the all the index was marked as "multikey", so the indexbound resolution was not correct
|
1

The $min and $max operators may help to work around this issue, by allowing you to explicitly specify the index bounds.

Example:

db.demo.find({foo: {$lt:3, $gt: 1}, 'attr.name': 'c'}).
 hint('attr.name_1_foo_1').
 min({'attr.name': 'c', foo: 1.000001}).
 max({'attr.name': 'c', foo: 3}).explain()

Result:

{
    "cursor" : "BtreeCursor attr.name_1_foo_1",
    "isMultiKey" : true,
    "n" : 1,
    "nscannedObjects" : 1,
    "nscanned" : 1,
    "nscannedObjectsAllPlans" : 1,
    "nscannedAllPlans" : 1,
    "scanAndOrder" : false,
    "indexOnly" : false,
    "nYields" : 0,
    "nChunkSkips" : 0,
    "millis" : 0,
    "indexBounds" : {
        "start" : {
            "attr.name" : "c",
            "foo" : 1.000001
        },
        "end" : {
            "attr.name" : "c",
            "foo" : 3
        }
    }
}

There are some important caveats though:

  1. $min is always inclusive (like $gte), and $max is always exclusive (like $lt). You may have to adjust your values to get the effect of $gt or $lte.
  2. The fields in $min and $max must exactly match the fields in the index.
  3. You can only have one set of index bounds per query. There is no equivalent for an $in or $or query.
  4. While the operators are documented, it does not seem to be recommended for normal use cases.

Point 3 is a blocker for me (I need to do an $in on the array field), so I'm still looking for another solution.

Source: https://groups.google.com/forum/#!msg/mongodb-user/oxL8wuVdITA/uWJHVbMd_-8J

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.