0

I saw a few similar posts to this here on StackOverflow, but I still don't have a clear understanding of how to index a large file with JSON documents into ElasticSearch; I'm getting errors like the following:

{"error":"ActionRequestValidationException[Validation Failed: 1: index is missing;2: type is missing;]","status":400}

{"took":231,"errors":false,"items":[{"index":{"_index":"test","_type":"type1","_id":"1","_version":7,"status":200}}]

I have a JSON file that is about 2Gb in size, which is the file I actually want to import. But first, in order to understand how the Bulk API works, I created a small file with just a single line of actual data:

testfile.json

{"index":{"_id":"someId"}} \n
{"id":"testing"}\n

I got this from another post on SO. I understand that the first line is a header, and I also understand that the "index" in the first line is the command which is going to be sent to ES; however, this still does not work. Can someone please give me a working example and clear explanation of how to import a JSON file into ES?

Thank you!

1 Answer 1

0

The following samples comes from the elasticsearch documentation: https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html?q=bulk

{ "index" : { "_index" : "test", "_type" : "type1", "_id" : "1" } }
{ "field1" : "value1" }
{ "delete" : { "_index" : "test", "_type" : "type1", "_id" : "2" } }
{ "create" : { "_index" : "test", "_type" : "type1", "_id" : "3" } }
{ "field1" : "value3" }
{ "update" : {"_id" : "1", "_type" : "type1", "_index" : "index1"} }
{ "doc" : {"field2" : "value2"} }

So line one tells elastic to index the document on line two into index test, type type1 with _id 1. It will index the document with field1. You could change the url if they all go to the same index and type. Check the link for samples.

In line three you see an example of a delete action, this document does not need a document in line four.

Be careful with very large documents, 2 Gb is probably to big. It needs to be send to elastic first, which loads it into memory. So there is a limit to the amount of records to send.

Sign up to request clarification or add additional context in comments.

6 Comments

This is the error I get when I use the first two lines in this example: {"took":36,"errors":false,"items":[{"index":{"_index":"test","_type":"type1","_id":"1","_version":8,"status":200}}]} and I still don't have a clear understanding of why there is an "_index", "_type", and "_id" field for a document that contains "field1" and "value1".
Can you update your question with the code you use to execute the bulk? Do you use curl or something else?
Hi Jettro; apparently that was actually not an error - it's the correct output! However, I have no idea what that output means.. I understand 2Gb may be too large to index all at once, but say I have a 100Mb pure JSON file - how do I import this into ES? What are the "_index", "_type", and "_id" fields exactly? Thanks for your help by the way.
That is way you structure your data. An index is like the main collection. You can have multiple types but one is fine within the index. If you have your own id you can provide it. But you can also omit it.
This is not the structure I would use. For Maximum I would just present the value, not an additional property called DoubleValue. You should be able to use the same line, but than bulk would create an index called test with the type called type1 and id of 1.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.