63

I'm using the Elasticsearch Bulk API to create or update documents.

I do actually know if they are creates or updates, but I can simplify my code by just making them all index, or "upserts" in the SQL sense.

Is there any disadvantage in using index (and letting ES figure it out) over using the more explicit create and update?

3 Answers 3

97

If you're sending create, you must ensure that the document doesn't exist yet in your index otherwise the call will fail, whereas sending the same document with index will always succeed.

Then, if for performance reasons, you know you'll create a document (with either create or index) and then you'll only update just a few properties, then using update might make sense.

Otherwise, if you're always sending full documents, I'd use index all the time, for both creating and updating. Whenever it sees an index action, ES will either create the document if it doesn't exist or replace it if it exists, but the call will always succeed.

Sign up to request clarification or add additional context in comments.

1 Comment

Also, Index API does not support scripted updates, so you must use Update API if you want to use scripts. Update API does not support external versions, so you must use Index API if you want to use external versioning.
7

The short answer: No there is no disadvantage.

The create and update endpoint are special cases. With create you want to do nothing if the document is already there. With update you can provided less data if you do not have all the data of the document you could just add a few fields. You could also make sure the document is only indexed in case it is already there with the update.

2 Comments

Assume you could potentially replace a given document many, many times (like, indexing the exact same document many times). Since ES doesn't really "delete" anything, aren't you adding more and more documents and incrementing their version number, leaving it to the garbage collector later to clean up older versions? In that case, isn't using index vs create going to bloat your index in the short term, which might affect performance? And isn't the future heavy usage of garbage collection also going to affect performance? This is a real question I'm wondering, not a rhetorical one. Thanks
But when you update a document, isn't it fetch, modify and then index it anyway? In Updating a Whole Document, we said that the way to update a document is to retrieve it, change it, and then reindex the whole document. This is true. However, using the update API, we can make partial updates like incrementing a counter in a single request. We also said that documents are immutable: they cannot be changed, only replaced. The update API must obey the same rules.
4

You won't be able to use index for everything. According to the docs:

index will add or replace a document as necessary

Also, if you are updating a document, it might be worthwhile to add the 'doc_as_upsert' flag. More info here and here

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.