ElasticSearch JSON file import (Bulk API)

Question

I saw a few similar posts to this here on StackOverflow, but I still don't have a clear understanding of how to index a large file with JSON documents into ElasticSearch; I'm getting errors like the following:

{"error":"ActionRequestValidationException[Validation Failed: 1: index is missing;2: type is missing;]","status":400}

{"took":231,"errors":false,"items":[{"index":{"_index":"test","_type":"type1","_id":"1","_version":7,"status":200}}]

I have a JSON file that is about 2Gb in size, which is the file I actually want to import. But first, in order to understand how the Bulk API works, I created a small file with just a single line of actual data:

testfile.json

{"index":{"_id":"someId"}} \n
{"id":"testing"}\n

I got this from another post on SO. I understand that the first line is a header, and I also understand that the "index" in the first line is the command which is going to be sent to ES; however, this still does not work. Can someone please give me a working example and clear explanation of how to import a JSON file into ES?

Thank you!

Jettro Coenradie · Accepted Answer · 2015-08-10 20:20:21Z

0

The following samples comes from the elasticsearch documentation: https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html?q=bulk

{ "index" : { "_index" : "test", "_type" : "type1", "_id" : "1" } }
{ "field1" : "value1" }
{ "delete" : { "_index" : "test", "_type" : "type1", "_id" : "2" } }
{ "create" : { "_index" : "test", "_type" : "type1", "_id" : "3" } }
{ "field1" : "value3" }
{ "update" : {"_id" : "1", "_type" : "type1", "_index" : "index1"} }
{ "doc" : {"field2" : "value2"} }

So line one tells elastic to index the document on line two into index test, type type1 with _id 1. It will index the document with field1. You could change the url if they all go to the same index and type. Check the link for samples.

In line three you see an example of a delete action, this document does not need a document in line four.

Be careful with very large documents, 2 Gb is probably to big. It needs to be send to elastic first, which loads it into memory. So there is a limit to the amount of records to send.

answered Aug 10, 2015 at 20:20

Jettro Coenradie

4,73325 silver badges31 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

corecase Over a year ago

This is the error I get when I use the first two lines in this example: {"took":36,"errors":false,"items":[{"index":{"_index":"test","_type":"type1","_id":"1","_version":8,"status":200}}]} and I still don't have a clear understanding of why there is an "_index", "_type", and "_id" field for a document that contains "field1" and "value1".

Jettro Coenradie Over a year ago

Can you update your question with the code you use to execute the bulk? Do you use curl or something else?

corecase Over a year ago

Hi Jettro; apparently that was actually not an error - it's the correct output! However, I have no idea what that output means.. I understand 2Gb may be too large to index all at once, but say I have a 100Mb pure JSON file - how do I import this into ES? What are the "_index", "_type", and "_id" fields exactly? Thanks for your help by the way.

Jettro Coenradie Over a year ago

That is way you structure your data. An index is like the main collection. You can have multiple types but one is fine within the index. If you have your own id you can provide it. But you can also omit it.

Jettro Coenradie Over a year ago

This is not the structure I would use. For Maximum I would just present the value, not an additional property called DoubleValue. You should be able to use the same line, but than bulk would create an index called test with the type called type1 and id of 1.

|

Collectives™ on Stack Overflow

ElasticSearch JSON file import (Bulk API)

1 Answer 1

6 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

6 Comments

Your Answer

Sign up or log in

Post as a guest

Related