logstash schedule inserting duplicate records into elasticsearch

Question

I created logstash config file with JDBC input plugin to bring Oracle database tables into elasticsearch and i made it as schedule for every five minutes.

Its working as per expected but the problem is, its inserting duplicate records when it is running for 2nd, 3rd time. how can we avoid inserting duplication records into elasticsearch.?

Please find my logstash config file with JDBC input plugin

input {
      jdbc {
        jdbc_driver_library => "D:\1SearchEngine\data\ojdbc8.jar"
        jdbc_driver_class => "Java::oracle.jdbc.OracleDriver"
        jdbc_connection_string => "jdbc:oracle:thin:@localhost:1521:XE"
        jdbc_user => "demo"
        jdbc_password => "1234567"
        schedule => "*/5 * * * *"
        statement => "select * from documents"
      }
    }

    output {
      elasticsearch {
        hosts => ["localhost:9200"]
        index => "schedule1_documents"
      }
    }

please find my document table schema

id  ---> Not Null number
FileName ---> varchar2
Path     ----> varchar2
File_size ---> varchar2

Val · Accepted Answer · 2018-06-20 06:16:51Z

3

You need to use the id field from your documents table. Otherwise, ES will create an id itself.

So your output should look like this instead:

  elasticsearch {
    hosts => ["localhost:9200"]
    index => "schedule1_documents"
    document_id => "%{id}"              <-- add this line with the proper ID field
  }

edited Jun 20, 2018 at 6:16

answered Jun 11, 2018 at 4:41

Val

218k14 gold badges377 silver badges384 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Karthikeyan Over a year ago

Do we need to pass some value for this "%{ID}" ..? can you give some example

Val Over a year ago

that's supposed to be the name of the ID field in your documents table.

Karthikeyan Over a year ago

if i give document_id => "%{ID}" all the documents are indexing with this id, actually its overriding with this %{ID}. Finally i have only one document in my index.

Val Over a year ago

I've updated my answer, simply replace %{ID} by %{id}

Karthikeyan Over a year ago

Thank soo much Val., its working very nicely. Thanks for your help

Collectives™ on Stack Overflow

logstash schedule inserting duplicate records into elasticsearch

1 Answer 1

5 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

5 Comments

Your Answer

Sign up or log in

Post as a guest

Related