3

Using Mongo 3.2.

Let's say I have a collection with this schema:

{ _id: 1, type: a, source: x },
{ _id: 2, type: a, source: y },
{ _id: 3, type: b, source: x },
{ _id: 4, type: b, source: y }

Of course that my db is much larger and with many more types and sources.

I have created 4 indexes combinations of type and source (even though 1 should be enough):

{type: 1}
{source: 1},
{type: 1, source: 1},
{source: 1, type: 1}

Now, I am running this distinct query:

db.test.distinct("source", {type: "a"})

The problem is that this query takes much more time that it should take. If I run it with runCommand:

db.runCommand({distinct: 'test', key: "source", query: {type: "a"}})

this is the result i get:

{
    "waitedMS": 0,
    "values": [
        "x",
        "y"
    ],
    "stats": {
        "n": 19400840,
        "nscanned": 19400840,
        "nscannedObjects": 19400840,
        "timems": 14821,
        "planSummary": "IXSCAN { type: 1 }"
    },
    "ok": 1
}

For some reason, mongo use only the type: 1 index for the query stage. It should use the index also for the distinct stage. Why is that? Using the {type: 1, source: 1} index would be much better, no? right now it is scanning all the type: a documents while it has an index for it.

Am I doing something wrong? Do I have a better option for this kind of distinct?

2 Answers 2

1

As Alex mentioned, apparently MongoDB doesn't support this right now. There is an open issue for it: https://jira.mongodb.org/browse/SERVER-19507

Sign up to request clarification or add additional context in comments.

2 Comments

from the issue,looks like it should have been implemented in 3.4?
@Alvin Wong You are right, thankfully this feature has been implemented in Mongo 3.4.
-2

Just drop first 2 indexes. You don't need them. Mongo can use {type: 1, source: 1} in any query that may need {type: 1} index.

4 Comments

It doesn't change the results. Yea it will use the {type: 1, source: 1} index, but only for the query stage, not for the distinct stage, so the time and the resutls are the same
It does not use indexes for distinct stage, unless index start from the key. In your case {source: 1} can be used for DISTINCT_SCAN if query is empty.
I know, but this is not what I want. I want to distinct with a query, so it should use DISTINCT_SCAN on {type: 1, source: 1} index
Not doable atm. You can vote for the ticket jira.mongodb.org/browse/SERVER-19507, but I don't think it's going to be resolved any soon.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.