How to avoid index for every document in elasticsearch bulk API

Question

I'm using curl to add apache logrows as documents to elasticsearch using the bulk API. I post the following:

{"index": {"_type": "apache", "_id": "123", "_index": "apache-2017-01"}}
{"s": 200, "d": "example.se", "@t": "2017-01-01T00:00:00.000Z", "p": "/foo"}
{"index": {"_type": "apache", "_id": "124", "_index": "apache-2017-01"}}
{"s": 200, "d": "example.se", "@t": "2017-01-01T00:00:00.000Z", "p": "/bar"}
... more of the same ...

My guess is that for every logrow document row the lucene index updates it's index. But I do not need elasticsearch to do that. I am perfectly fine with adding all logrow documents first, and after that update the index.

Is this possible? Is it a good idé? Will it pollibly improve performance?

The whole point of the bulk API is to perform a bunch of index/delete operations in an efficient way. Why do you suspect that Elasticsearch is doing it inefficiently? — femtoRgon
– femtoRgon, Commented Nov 17, 2017 at 17:38

Nikolay Vasiliev · Accepted Answer · 2017-11-18 11:59:49Z

Your intuition is not far from truth. By default ElasticSearch will update its index every second:

The default index.refresh_interval is 1s, which forces Elasticsearch to create a new segment every second. Increasing this value (to say, 30s) will allow larger segments to flush and decreases future merge pressure.

So one of the ways to increase indexing throughput is increasing this index.refresh_interval, possibly even to infinity and then turn it back on once you have finished your inserts. (Note that inserted documents will be available for searching only after segment was closed, i.e. writing to it has finished.)

This, however, is not the only possible bottleneck when inserting documents into ElasticSearch. For example, you might consider using several threads for inserting documents in bulk, or other tweaks that are described in in Tune for index speed section of ElasticSearch documentation. You can look up other indexing parameters you may want to change in Dynamic Index Settings section.

Hope that helps!

Collectives™ on Stack Overflow

How to avoid index for every document in elasticsearch bulk API

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related