3

I have collection A with a non-unique index over "field1".

If I run:

db.A.explain().distinct("field1")

I get:

"winningPlan" : {
    "stage" : "PROJECTION",
    ...
    "inputStage" : {
        "stage" : "DISTINCT_SCAN",
        "keyPattern" : {
            "field1" : 1.0
        },
    ...
}

Which suggests it'll use the index for the distinct call.

However, in collection B with a non-unique index on "type2.key", if I run:

db.B.explain().distinct("type2.key")

I get:

"winningPlan" : {
    "stage" : "COLLSCAN",
    "filter" : {
        "$and" : []
    },
    ...
}

which seems to mean it doesn't use the index.

Why can distinct use the index on collection A but not on collection B, and can I do something to force the use of the index?

Notes:

  1. collection B is a lot bigger then collection A, is there a limit to the size of the index distinct can use?
  2. I've read: Count distinct values in mongoDB and MongoDB - distinct with query doesn't use indexes they don't help to explain the difference in behavior I'm seeing.
  3. Both collections are sharded
  4. mongodb version is 3.2.12

EXAMPLE DOCUMENT

{
    "_id" : ObjectId("57d6c1cf691fa014e0615aa7"),
    "type1" : [ 
        {
            "key" : "key1",
            "field" : "value1",
        },
        {
            "key" : "key2",
            "field" : "value2",
        }
    ],
    "type2" : [ 
        {
            "key" : "key3",
            "field" : "value3",
        },
        {
            "key" : "key4",
            "field" : "value4",
        }
    ]
}

The index is on type2.key

2
  • Docs are very clear about it docs.mongodb.com/manual/reference/method/db.collection.distinct/… : "When possible, it can use indexes.". Try db.B.explain("allPlansExecution").distinct("obj.field2") to see why it is not possible. filters part looks suspicious. Do you have any query parameter there? Commented Mar 29, 2017 at 8:43
  • @AlexBlex I would claim that When possible is as vague as it gets... also, when trying the "allPlansExecution" mode, the explain never returns (it's a very big collection). Another thing I haven't mentioned is that this both collections are sharded, I'll add it to the question notes Commented Mar 29, 2017 at 8:59

1 Answer 1

3

The rules when distinct index can be used are there https://github.com/mongodb/mongo/blob/v3.4/src/mongo/db/query/get_executor.cpp#L1104

Most important line for this particular case https://github.com/mongodb/mongo/blob/v3.4/src/mongo/db/query/get_executor.cpp#L1139 says:

Skip multikey indices if we are projecting on a dotted field.

"obj.field2" is a dotted field, so index does not apply.

So basically, distinct() can use indexes only for root fields, not for array nor subdocuments.

Sign up to request clarification or add additional context in comments.

5 Comments

thanks! that's a very good lead... but isn't multikey index an index over an array? if that's the case then my index isn't multikey.
Fair enough. It would be worth to add to the question. as well as mongodb version, and may be an example of documents in both collections to avoid further confusion.
mongodb version is 3.2.12, added to the question. come to think about it, the field before the dot in the index - obj - is an array of inner-documents, would that make my index multikey? I'll add an example document to the question
Yes. It is a multikey index. You can check it with db.B.getIndexes()
my understanding is that since this (4.1.10) fix jira.mongodb.org/browse/SERVER-13298 multikeys+dotted are supported, but in case of dotted field, mongo will use IXSCAN and cant a apply DISTINCT_SCAN

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.