0

As part of data analysis, I collect records I need to store in Elasticsearch. As of now I gather the records in an intermediate list, which I then write via a bulk update.

While this works, it has its limits when the number of records is so large that they do not fit into memory. I am therefore wondering if it is possible to use a "streaming" mechanism, which would allow to

  • persistently open a connection to elasticsearch
  • continuously update in a bulk-like way

I understand that I could simply open a connection to Elasticsearch and classically update as data are available but this is about 10 times slower, so I would like to keep the bulk mechanism:

import elasticsearch
import elasticsearch.helpers
import elasticsearch.client
import random
import string
import time

index = "testindexyop1"
es = elasticsearch.Elasticsearch(hosts='elk.example.com')
if elasticsearch.client.IndicesClient(es).exists(index=index):
    ret = elasticsearch.client.IndicesClient(es).delete(index=index)

data = list()
for i in range(1, 10000):
    data.append({'hello': ''.join(random.choice(string.ascii_uppercase + string.digits) for _ in range(10))})

start = time.time()
# this version takes 25 seconds
# for _ in data:
#     res = es.bulk(index=index, doc_type="document", body=_)

# and this one - 2 seconds
elasticsearch.helpers.bulk(client=es, index=index, actions=data, doc_type="document", raise_on_error=True)

print(time.time()-start)

1 Answer 1

0

You can always simply split data into n approximately equally sized sets such that each of them fits in memory and then do n bulk updates. This seems to be the easiest solution to me.

Sign up to request clarification or add additional context in comments.

2 Comments

yes, this is a good solution - but it somehow rewrites the streaming functionality. I was looking for something possibly built-in.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.