11

This question arises from this SO thread.

As it seems I have a similar but not the same query, it might be best to have a separate question for others to benefit from, as @Val suggested.

So, similar to the above, I have the need to insert a massive amount of data into an index (my initial testing is about 10 000 documents but this is just for a POC, there are many more). The data I would like to insert is in a .json document and looks something like this (snippet):

[ { "fileName": "filename", "data":"massive string text data here" }, 
  { "fileName": "filename2", "data":"massive string text data here" } ]

On my own admission I am new to ElasticSearch, however, from reading through the documentation, my assumptions were that I could take a .json file and create an index from the data within. I have now since learnt that it seems each item within the json needs to have a "header", something like:

{"index":{}}
{ "fileName": "filename", "data":"massive string text data here" }

Meaning, that this is not actual json format (as such) but rather manipulated string.

I would like to know if there is a way to import my json data as is (in json format), without having to manually manipulate the text first (as my test data has 10 000 entries, I'm sure you can see why I'd prefer not doing this manually).

Any suggestions or suggested automated tools to help with this?

PS - I am using the windows installer and Postman for the calls.

2 Answers 2

19

You can transform your file very easily with a single shell command like this. Provided that your file is called input.json, you can do this:

jq -c -r ".[]" input.json | while read line; do echo '{"index":{}}'; echo $line; done > bulk.json

After this you'll have a file called bulk.json which is properly formatted to be sent to the bulk endpoint.

Then you can call your bulk endpoint like this:

curl -XPOST localhost:9200/your_index/your_type/_bulk -H "Content-Type: application/x-ndjson" --data-binary @bulk.json

Note: You need to install jq first if you don't have it already.

Sign up to request clarification or add additional context in comments.

6 Comments

Tried this with a file I had ~64MB in size with ~75000 records and could do the transformation in about a minute and the load in less than 30 seconds.
Is there an alternative solution? For some reason I am not able to get jq working. I have it downloaded, but keep getting a callback saying 'jq' is not recognized when I run this command.
@Val I have tried the jq command within Windows Powershell and am receiving 'Missing statement body in do loop' I think it is because I am not specifying the input file correctly. For my input file I am putting "@C:\setting-es.json" .. I think I am having trouble with the syntax for jq
this worked for me jq -c '.[] | ({"index":{}}, [.])' activity-es-jq.json > bulk-activity.json executed from powershell
@cluis92 any reasons to put it in an array like [.] ? jq -c '.[] | ({"index":{}}, .)' works for me.
|
1

this is my code to bulk data to es


const es = require("elasticsearch");
const client = new es.Client({
  hosts: ["http://localhost:9200"],
});

const cities = <path to your json file>;

let bulk: any = [];

cities.forEach((city: any) => {
  bulk.push({
    index: {
      _index: <index name>,
      _type: <type name>,
    },
  });

  bulk.push(city);
});


//loop through each city and create and push two objects into the array in each loop
//first object sends the index and type you will be saving the data as
//second object is the data you want to index

client.bulk({ body: bulk }, function (err: any, response: any) {
  if (err) {
    console.log("Failed Bulk operation", err);
  } else {
    console.log("Successfully imported %s", cities.length);
  }
});

or you can use library like elasticdump or elasticsearch-tools

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.