1

I have a csv in which one column may contain multi-line values.

ID,Name,Address
1, ABC, "Line 1
Line 2
Line 3"

The data written above as per CSV standard is one record (to my knowledge).

I have following filter for logstash

filter {
  csv {
      separator => ","
      quote_char => "\""
     columns => ["ID","Name", "Address"]
  }
}
output {
   elasticsearch {
     host => "localhost"
     port => "9200"
     index => "TestData"
     protocol => "http"
  }
stdout {}
}

But when I execute it, it creates three records. (All are wrong in principle as first one contains two column data ID and Name and partial data for Address and next two records contain Line 2 and Line 3 but no ID and Name

How can I fix this? Am I missing something in the file parsing?

1 Answer 1

3

have you tryed the multiline codec?

You should add something like this in your input plugin:

codec => multiline {
      pattern => "^[0-9]"
      negate => "true"
      what => "previous"
    }

it tells logstash that every line not starting with a number should be merged with the previous line

Sign up to request clarification or add additional context in comments.

1 Comment

I went with "^([0-9]+,)" to be a bit more specific

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.