ElasticSearch Timeout Error: ReadTimeoutError(HTTPConnectionPool(host='localhost', port=9200): Read timed out. (read timeout=60))

Question

I have an instance of ElasticSearch running on a server. When I try to index a huge corpus using multiprocessing, I get a lot of timeout errors. It seems that the EasticSearch can handle only a few numbers of requests. I've followed the configuration suggested in the ElasticSearch website. Are there any suggestions on what should I do to increase its indexing performance for a multiprocessing setting? The index that I'm adding documents to has one shard.

We got very few details on the configuration and the bottleneck can come from a lot of points. First change, change the index.refresh_interval to -1 when first indexing(and re change it after first ingestion). But as you work in localhost, I guess you are doing a lot af IO on the same HDD or your RAM is full and your computer is swapping — Jaycreation
– Jaycreation, Commented Oct 8, 2020 at 5:03

Saeed Nasehi · Accepted Answer · 2020-10-09 10:56:26Z

1

There are plenty of works that you can do.

First, you need to set refresh_interval. Refresh interval is the time that the added document will become available for search. If you can tolerate set it to at least 30 seconds or -1. I have read that this will increase the indexing performance by about 70%.
The second thing that you can try is to use bulk index API instead of a single document indexing.
Disabling swap can make an upper performance for you in some special cases.
One of the other options that you can try is to increase the RAM size that you have assigned to your elasticsearch;
Finally, increasing the size of HEAP to be used for indexing can increase the writing performance. the default size is 10 percent of all heap size.

I hope these points could help you.

answered Oct 9, 2020 at 10:56

Saeed Nasehi

1,0001 gold badge14 silver badges29 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

ElasticSearch Timeout Error: ReadTimeoutError(HTTPConnectionPool(host='localhost', port=9200): Read timed out. (read timeout=60))

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related