1

I'm trying to import a large JSON document in to Elasticsearch 5.1. A small section of the data looks like this:

[
    {
      "id": 1,
      "region": "ca-central-1",
      "eventName": "CreateRole",
      "eventTime": "2016-02-04T03:41:19.000Z",
      "userName": "[email protected]"
    },
    {
      "id": 2,
      "region": "ca-central-1",
      "eventName": "AddRoleToInstanceProfile",
      "eventTime": "2016-02-04T03:41:19.000Z",
      "userName": "[email protected]"
    },
    {
      "id": 3,
      "region": "ca-central-1",
      "eventName": "CreateInstanceProfile",
      "eventTime": "2016-02-04T03:41:19.000Z",
      "userName": "[email protected]"
    },
    {
      "id": 4,
      "region": "ca-central-1",
      "eventName": "AttachGroupPolicy",
      "eventTime": "2016-02-04T01:42:36.000Z",
      "userName": "[email protected]"
    },
    {
      "id": 5,
      "region": "ca-central-1",
      "eventName": "AttachGroupPolicy",
      "eventTime": "2016-02-04T01:39:20.000Z",
      "userName": "[email protected]"
    }
]

I'd like to import the data without making any changes to the source data if possible, so I believe that rules out the _bulk command as I'd need to add additional details for each entry.

I've tried several different methods but have not had any luck. Am I wasting my time trying to import this document as-is?

I've tried:

curl -XPOST 'demo.ap-southeast-2.es.amazonaws.com/rea/test' --data-binary @Records.json

But that fails with an error:

{"error":{"root_cause":[{"type":"mapper_parsing_exception","reason":"failed to parse"}],"type":"mapper_parsing_exception","reason":"failed to parse","caused_by":{"type":"not_x_content_exception","reason":"Compressor detection can only be called on some xcontent bytes or compressed xcontent bytes"}},"status":400}

Thanks!

2 Answers 2

1

If you don't want to modify the file the bulk api will not work.

You can have a look at jq. It is a command line json parser. It might help you generate the document required to run the bulk api.

cat Records.json | 
jq -c '
.[] |
{ index: { _index: "index_name", _type: "type_name" } },
. '

You can try something like this and pass it to the bulk api. Hope this helps.

You can also try making a curl call which would be something like this.

cat Records.json | 
jq -
.[] |
{ index: { _index: "index_name", _type: "type_name" } },
. ' | curl -XPOST demo.ap-southeast-2.es.amazonaws.com/_bulk --data-binary @-

Have not tried the second part but should work.

Sign up to request clarification or add additional context in comments.

1 Comment

Thank you for replying. I will try these options and report back!
0

You might want to check out stream2es - it's a helpful utility for sending documents to ElasticSearch. I think it may do what you need to do.

Once you have it installed, you should be able to use it something like this:

cat Records.json | ./stream2es stdin --target 'http://demo.ap-southeast-2.es.amazonaws.com/rea/test'

1 Comment

Thank you for replying. I will try stream2es as well!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.