6

I am new to elasticsearch and have huge data(more than 16k huge rows in the mysql table). I need to push this data into elasticsearch and am facing problems indexing it into it. Is there a way to make indexing data faster? How to deal with huge data?

4
  • 16K documents is in fact a small number. Indexing time depends mostly on your index definition (anaylzers used etc.) and Lucene configuration values such as mergeFactor. It's hard to give you any precise answer without this information, but you can start with increasing mergeFactor to see if the problem is on elasticsearch side. Maybe the bottleneck is somewhere else? Commented May 21, 2012 at 8:49
  • the code i am using to index is simply foreach($results as $row){ $json=json_encode((array)$row); $e->add($type,$counter++,$json); } function add($type,$id, $data) { return $this->call($type . '/' . $id, array('method' => 'PUT', 'header' => "Content-Type: application/x-www-form-urlencoded\r\n", 'content' => $data)); } and I guess I have not used any analyzers.And the problem is not with 16k rows but those rows have fields which themselves contain data of entire table. So the amount of data to index is huge. Commented May 21, 2012 at 9:00
  • Have you tried increasing the mergeFactor then? Or profiling how long it takes to json_encode these "entire tables"? Commented May 21, 2012 at 9:19
  • i dont know about mergeFactor...and json_encode is quite fast...so i dont think we have a problem there. Commented May 21, 2012 at 9:26

3 Answers 3

3

Expanding on the Bulk API

You will make a POST request to the /_bulk

Your payload will follow the following format where \n is the newline character.

action_and_meta_data\n
optional_source\n
action_and_meta_data\n
optional_source\n
...

Make sure your json is not pretty printed

The available actions are index, create, update and delete.


Bulk Load Example

To answer your question, if you just want to bulk load data into your index.

{ "create" : { "_index" : "test", "_type" : "type1", "_id" : "3" } }
{ "field1" : "value3" }

The first line contains the action and metadata. In this case, we are calling create. We will be inserting a document of type type1 into the index named test with a manually assigned id of 3 (instead of elasticsearch auto-generating one).

The second line contains all the fields in your mapping, which in this example is just field1 with a value of value3.

You will just concatenate as many as these as you'd like to insert into your index.

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks Kirk. Would you mind expanding your example, I would like to work with this further. So using your example, if I have 100 entries from (say) a database to index, I will create a single JSON of 200 lines, right? What would the URL be to post this to the indexing server? Can you provide an example of a full JSON string that does this?
OK, I found this, elastic.co/guide/en/elasticsearch/reference/current/… that will send mw on my way.
2

This may be an old thread but I though I would comment anyway for anyone who is looking for a solution to this problem. The JDBC river plugin for Elastic Search is very useful for importing data from a wide array of supported DB's.

Link to JDBC' River source here.. Using Git Bash' curl command I PUT the following configuration document to allow for communication between ES instance and MySQL instance -

curl -XPUT 'localhost:9200/_river/uber/_meta' -d '{
"type" : "jdbc",
"jdbc" : {
 "strategy" : "simple",
 "driver" : "com.mysql.jdbc.Driver",
 "url" : "jdbc:mysql://localhost:3306/elastic",
 "user" : "root",
 "password" : "root",
 "sql" : "select * from tbl_indexed",
 "poll" : "24h",
 "max_retries": 3,
 "max_retries_wait" : "10s"
 },
 "index": {
 "index": "uber",
 "type" : "uber",
 "bulk_size" : 100
 }
}'

Ensure you have the mysql-connector-java-VERSION-bin in the river-jdbc plugin directory which contains jdbc-river' necessary JAR files.

Comments

0

Try bulk api

http://www.elasticsearch.org/guide/reference/api/bulk.html

2 Comments

Please expand on your answer, link-only answers are not future-proof
Bulk API with curl won't take my multiline json data about 2g in size.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.