69

I am trying to bulk index a JSON file into a new Elasticsearch index and am unable to do so. I have the following sample data inside the JSON

[{"Amount": "480", "Quantity": "2", "Id": "975463711", "Client_Store_sk": "1109"},
{"Amount": "2105", "Quantity": "2", "Id": "975463943", "Client_Store_sk": "1109"},
{"Amount": "2107", "Quantity": "3", "Id": "974920111", "Client_Store_sk": "1109"},
{"Amount": "2115", "Quantity": "2", "Id": "975463798", "Client_Store_sk": "1109"},
{"Amount": "2116", "Quantity": "1", "Id": "975463827", "Client_Store_sk": "1109"},
{"Amount": "648", "Quantity": "3", "Id": "975464139", "Client_Store_sk": "1109"},
{"Amount": "2126", "Quantity": "2", "Id": "975464805", "Client_Store_sk": "1109"},
{"Amount": "2133", "Quantity": "1", "Id": "975464061", "Client_Store_sk": "1109"},
{"Amount": "1339", "Quantity": "4", "Id": "974919458", "Client_Store_sk": "1109"},
{"Amount": "1196", "Quantity": "5", "Id": "974920538", "Client_Store_sk": "1109"},
{"Amount": "1198", "Quantity": "4", "Id": "975463638", "Client_Store_sk": "1109"},
{"Amount": "1345", "Quantity": "4", "Id": "974919522", "Client_Store_sk": "1109"},
{"Amount": "1347", "Quantity": "2", "Id": "974919563", "Client_Store_sk": "1109"},
{"Amount": "673", "Quantity": "2", "Id": "975464359", "Client_Store_sk": "1109"},
{"Amount": "2153", "Quantity": "1", "Id": "975464511", "Client_Store_sk": "1109"},
{"Amount": "3896", "Quantity": "4", "Id": "977289342", "Client_Store_sk": "1109"},
{"Amount": "3897", "Quantity": "4", "Id": "974920602", "Client_Store_sk": "1109"}]

I am using

 curl -XPOST localhost:9200/index_local/my_doc_type/_bulk --data-binary --data @/home/data1.json 

When I try to use the standard bulk index API from Elasticsearch I get this error

 error: {"message":"ActionRequestValidationException[Validation Failed: 1: no requests added;]"}

Can anyone help with indexing this type of JSON?

3
  • can you tell me the index request you are using Commented Oct 26, 2015 at 7:06
  • @KumarKailash here is request curl -XPOST localhost:9200/index_local/my_doc_type/_bulk --data-binary --data @/home/data1.json Commented Oct 26, 2015 at 7:21
  • Try this way: stackoverflow.com/a/65213529/3357884 Commented Dec 9, 2020 at 9:01

4 Answers 4

112

What you need to do is to read that JSON file and then build a bulk request with the format expected by the _bulk endpoint, i.e. one line for the command and one line for the document, separated by a newline character... rinse and repeat for each document:

curl -XPOST localhost:9200/your_index/_bulk -d '
{"index": {"_index": "your_index", "_type": "your_type", "_id": "975463711"}}
{"Amount": "480", "Quantity": "2", "Id": "975463711", "Client_Store_sk": "1109"}
{"index": {"_index": "your_index", "_type": "your_type", "_id": "975463943"}}
{"Amount": "2105", "Quantity": "2", "Id": "975463943", "Client_Store_sk": "1109"}
... etc for all your documents
'

Just make sure to replace your_index and your_type with the actual index and type names you're using.

UPDATE

Note that the command-line can be shortened, by removing _index and _type if those are specified in your URL. It is also possible to remove _id if you specify the path to your id field in your mapping (note that this feature will be deprecated in ES 2.0, though). At the very least, your command line can look like {"index":{}} for all documents but it will always be mandatory in order to specify which kind of operation you want to perform (in this case index the document)

UPDATE 2

curl -XPOST localhost:9200/index_local/my_doc_type/_bulk --data-binary  @/home/data1.json

/home/data1.json should look like this:

{"index":{}}
{"Amount": "480", "Quantity": "2", "Id": "975463711", "Client_Store_sk": "1109"}
{"index":{}}
{"Amount": "2105", "Quantity": "2", "Id": "975463943", "Client_Store_sk": "1109"}
{"index":{}}
{"Amount": "2107", "Quantity": "3", "Id": "974920111", "Client_Store_sk": "1109"}

UPDATE 3

You can refer to this answer to see how to generate the new json style file mentioned in UPDATE 2.

UPDATE 4

As of ES 7.x, the doc_type is not necessary anymore and should simply be _doc instead of my_doc_type. As of ES 8.x, the doc type will be removed completely. You can read more about this here

Sign up to request clarification or add additional context in comments.

18 Comments

The command line is always mandatory for each document. If you add the index and type name in your URL (i.e. localhost:9200/your_index/your_type/_bulk), you can remove _index and _type from the command line to shorten it. There's also a way to not have to specify _id but at the very least, you'll always need to specify what operation you want to perform with the document, i.e. the shortest you can do is {"index":{}}
You don't need to specify your JSON objects inside an array (i.e. [...]) and no commas between documents, just one JSON per line with newline characters at the end of each line (don't forget a newline after the last line). I've updated my answer with your latest code.
@Val Am i correct then in saying that you cannot simply pass in a .json object and that it needs to be parsed / transformed first (i.e. each item on it's own line and an added index header for each item? If so, is there a known tool that can be used to do this automatically? I ask cause I have a json file that contains 10 000 items, I assumed I would be able to pass the entire document in, however, I was quickly corrected.
@Hexie in your case, you can use UPDATE 2 above and a shell script one-liner to update your file and add the header line.
@Val Fair suggestion - new question created: stackoverflow.com/questions/45601344/…
|
16

As of today, 6.1.2 is the latest version of ElasticSearch, and the curl command that works for me on Windows (x64) is

curl -s -XPOST localhost:9200/my_index/my_index_type/_bulk -H "Content-Type: 
application/x-ndjson" --data-binary @D:\data\mydata.json

The format of the data that should be present in mydata.json remains the same as shown in @val's answer

1 Comment

A side note for Content-Type that we should not include charset in it as a bug will return HTTP 406 github.com/elastic/elasticsearch/issues/28123
4

A valid Elasticsearch bulk API request would be something like (ending with a newline):

POST http://localhost:9200/products_slo_development_temp_2/productModel/_bulk

{ "index":{ } } 
{"RequestedCountry":"slo","Id":1860,"Title":"Stol"} 
{ "index":{ } } 
{"RequestedCountry":"slo","Id":1860,"Title":"Miza"} 

Elasticsearch bulk api documentation: https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html

This is how I do it

I send a POST http request with the uri valiable as the URI/URL of the http request and elasticsearchJson variable is the JSON sent in the body of the http request formatted for the Elasticsearch bulk api:

var uri = @"/" + indexName + "/productModel/_bulk";
var json = JsonConvert.SerializeObject(sqlResult);
var elasticsearchJson = GetElasticsearchBulkJsonFromJson(json, "RequestedCountry");

Helper method for generating the required json format for the Elasticsearch bulk api:

public string GetElasticsearchBulkJsonFromJson(string jsonStringWithArrayOfObjects, string firstParameterNameOfObjectInJsonStringArrayOfObjects)
{
  return @"{ ""index"":{ } } 
" + jsonStringWithArrayOfObjects.Substring(1, jsonStringWithArrayOfObjects.Length - 2).Replace(@",{""" + firstParameterNameOfObjectInJsonStringArrayOfObjects + @"""", @" 
{ ""index"":{ } } 
{""" + firstParameterNameOfObjectInJsonStringArrayOfObjects + @"""") + @"
";
}

The first property/field in my JSON object is the RequestedCountry property that's why I use it in this example.

productModel is my Elasticsearch document type. sqlResult is a C# generic list with products.

Comments

3

This answer is for Elastic Search 7.x onwards. _type is deprecated. As others have mentioned, you can read the file programatically, and construct a request body as described below. Also, I see that each of your json object has the Id attribute. So, you could set the document's internal id (_id) to be the same as this attribute. Updated _bulk API would look like this:

HTTP Method: POST

URI: /<index_name>/_bulk

Request body (should end with a new line):

{"index":{"_id": "975463711"}}
{"Amount": "480", "Quantity": "2", "Id": "975463711", "Client_Store_sk": "1109"}
{"index":{"_id": "975463943"}}
{"Amount": "2105", "Quantity": "2", "Id": "975463943", "Client_Store_sk": "1109"}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.