1

I have a collection with several billion documents and need to create a unique multi-key index for every attribute of my documents.

The problem is, I get an error if I try to do that because the generated keys would be too large.

pymongo.errors.OperationFailure: WiredTigerIndex::insert: key too large to index, failing

I found out MongoDB lets you create hashed indexes, which would resolve this problem, however they are not to be used for multi-key indexes.

How can i resolve this?


My first idea was to create another attribute for each of my document with an hash of every value of its attributes, then creating an index on that new field.
However this would mean to recalculate the hash every time I wish to add a new attribute, plus the excessive amount of time necessary to create both the hashes and the indexes.

4
  • That might be a bit of a dead-end street. What underlying problem are you trying to solve by creating this index? Commented Aug 22, 2018 at 20:08
  • The saving procedure sometimes runs in parallel, saving more instances of the same data, a unique index would fix that I believe Commented Aug 23, 2018 at 7:13
  • That's one option, of course, unless you can somehow sync the parallel processes (through Mutexes or similar concepts). But do you really need all fields to form part of the unique index? Commented Aug 23, 2018 at 7:36
  • Unfortunately, yes, many of the saved data is repeated with very little variation, this forces me to consider all of the fields. Working on the source of the problem would have been the plan B in case this didn't work Commented Aug 23, 2018 at 7:41

1 Answer 1

1

This is a feature added in mongoDB since 2.6 to prevent the total size of an index entry to exceed 1024 bytes (also known as Index Key Length Limit).

In MongoDB 2.6, if you attempt to insert or update a document so that the value of an indexed field is longer than the Index Key Length Limit, the operation will fail and return an error to the client. In previous versions of MongoDB, these operations would successfully insert or modify a document but the index or indexes would not include references to the document.

For migration purposes and other temporary scenarios you can downgrade to 2.4 handling of this use case where this exception would not be triggered via setting this mongoDB server flag:

db.getSiblingDB('admin').runCommand( { setParameter: 1, failIndexKeyTooLong: false } )

This however is not recommended.

Also consider that creating indexes for every attribute of your documents may not be the optimal solution at all.

Have you examined how you query your documents and on which fields you key on? Have you used explain to view the query plan? It would be an exception to the rule if you tell us that you query on all fields all the time.

Here are the recommended MongoDB indexing strategies.

Excessive indexing has a price as well and should be avoided.

Sign up to request clarification or add additional context in comments.

1 Comment

For my case, I would actually use the unique index to avoid saving duplicate data instead of simplifying queries. I do appreciate the links and explanation you provided tough

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.