How to speed up ElasticSearch indexing?

Question

I am a beginner with elasticsearch and i have to write 1-million random events into an Elastic search cluster (hosted on the cloud), with a python script...

es = Elasticsearch(
    [host_name],
    port=9243,
    http_auth=("*****","*******"),
    use_ssl=True,
    verify_certs=True,
    ca_certs=certifi.where(),
    sniff_on_start=True
)

Here's my code for the indexing:

for i in range(1000000):

src_centers=['data center a','data center b','data center c','data center d','data center e']
transfer_src = np.random.choice(src_centers, p=[0.3, 0.175, 0.175, 0.175, 0.175])

dst_centers = [x for x in src_centers if x != transfer_src]
transfer_dst = np.random.choice(dst_centers)

final_transfer_status = ['transfer-success','transfer-failure']

transfer_starttime = generate_timestamp()
file_size=random.choice(range(1024,10000000000))
ftp={
    'event_type': 'transfer-queued',
    'uuid': uuid.uuid4(),
    'src_site' : transfer_src,
    'dst_site' : transfer_dst,
    'timestamp': transfer_starttime,
    'bytes' : file_size
}
print(i)
es.index(index='ft_initial', id=(i+1), doc_type='initial_transfer_details', body= ftp)

transfer_status = ['transfer-success', 'transfer-failure']
final_status = np.random.choice(transfer_status, p=[0.95,0.05])
ftp['event_type'] = final_status

if (final_status=='transfer-failure'):
    time_delay = 10
else :
    time_delay = int(transfer_time(file_size))   # ranges roughly from 0-10000 s 

ftp['timestamp'] = transfer_starttime + timedelta(seconds=time_delay)
es.index(index='ft_final', id=(i+1), doc_type='final_transfer_details', body=ftp)

Is there any alternate way to speed up the process??

Any help/pointers will be appreciated. Thanks.

What do you want to speed up? The indexing? The program itself? Please clarify your request — Adonis
– Adonis, Commented Mar 12, 2017 at 21:40
can you share your cluster topology with us, no of shards, node(master/data), your hardware specifications for cluster machines and also better add your elasticsearch.yml file . — user3775217
– user3775217, Commented Mar 13, 2017 at 4:51
Topology::{event_type: "transfer-queued", uuid: 471a885a-9d8a-4212-8ebc-d1bc96c91b3b, bytes: 5411345, timestamp: 2017-03-04T05:40:40 src_site: "data centre a", dst_site: "data centre c" } — Vyom Sharma
– Vyom Sharma, Commented Mar 14, 2017 at 10:59

xeraa · Accepted Answer · 2017-03-12 23:39:04Z

3

Use bulks, otherwise you have a lot of overhead for each single request: https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html
Change the refresh rate, ideally disable it totally until you're done: https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-update-settings.html#bulk
Use monitoring (there's a free basic license) to see what is actually the bottleneck (IO, memory, CPU): https://www.elastic.co/guide/en/x-pack/current/xpack-monitoring.html

answered Mar 12, 2017 at 23:39

xeraa

10.9k3 gold badges36 silver badges71 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Vyom Sharma Over a year ago

Thanks i solved it by doing exactly that.. used the helpers.bulk() function.

Collectives™ on Stack Overflow

How to speed up ElasticSearch indexing?

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related