0

Let's assume that I have tx_collection which has 3 documents like below

{
    "block_number": 1,
    "value": 122
    "transfers": [
        {
            "from": "foo1", 
            "to": "bar1", 
            "amount": 111
        },
        {
            "from": "foo3", 
            "to": "bar3", 
            "amount": 11
        },
    ]
},
{
    "block_number": 2,
    "value": 88
    "transfers": [
        {
            "from": "foo11", 
            "to": "bar11", 
            "amount": 33
        },
        {
            "from": "foo22", 
            "to": "bar22", 
            "amount": 55
        },
    ]
},
{
    "block_number": 3,
    "value": 233
    "transfers": [
        {
            "from": "foo1", 
            "to": "bar1", 
            "amount": 33
        },
        {
            "from": "foo3", 
            "to": "bar3", 
            "amount": 200
        },
    ]
}

For the performance issue, I create multikey index on transfers.amount

When I sort by transfers.amount,

db.getCollection('tx_transaction').find({}).sort({"transfers.amount":-1})

what I expected order of documents is sorted by max value of subfield transfers.amount like

{
    "block_number": 3,
    "value": 233
    "transfers": [
        {
            "from": "foo1", 
            "to": "bar1", 
            "amount": 33
        },
        {
            "from": "foo3", 
            "to": "bar3", 
            "amount": 200
        },
    ]
},
{
    "block_number": 1,
    "value": 122
    "transfers": [
        {
            "from": "foo1", 
            "to": "bar1", 
            "amount": 111
        },
        {
            "from": "foo3", 
            "to": "bar3", 
            "amount": 11
        },
    ]
},
{
    "block_number": 2,
    "value": 88
    "transfers": [
        {
            "from": "foo11", 
            "to": "bar11", 
            "amount": 33
        },
        {
            "from": "foo22", 
            "to": "bar22", 
            "amount": 55
        },
    ]
}

The sort works well since there are only 3 documents. Sorted order is block number 3 -> block number 1 -> block_number 2 which I expected

My issue is that when there is 19 million documents, it throws error message

The massage is like

"errmsg" : "Executor error during find command: OperationFailed: Sort operation used more than the maximum 33554432 bytes of RAM. Add an index, or specify a smaller limit.",

It seems that multikey index is not used when sort.

do you have any idea why this error message is thrown?

JFYI.

  • My mongodb version is 3.6.3
  • tx_collection is sharded

1 Answer 1

2

As of MongoDB 3.6 and newer, I think this is to be expected as mentioned in Use Indexes to Sort Query Results where it stated:

As a result of changes to sorting behavior on array fields in MongoDB 3.6, when sorting on an array indexed with a multikey index the query plan includes a blocking SORT stage. The new sorting behavior may negatively impact performance.

In a blocking SORT, all input must be consumed by the sort step before it can produce output. In a non-blocking, or indexed sort, the sort step scans the index to produce results in the requested order.

In other words, "blocking sort" means the presence of the SORT_KEY_GENERATOR stage, the stage that means in-memory sort. This was changed from pre-3.6 MongoDB due to SERVER-19402 to address the inconsistencies around sorting an array field.

There is a ticket to improve this situation: SERVER-31898. Unfortunately there is no workaround for this behaviour just yet.

Sign up to request clarification or add additional context in comments.

1 Comment

Thank you for your response. :) It was really helpful.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.