How do I avoid elasticsearch duplicate documents?
The elasticsearch index docs count (20,010,253) doesn’t match with logs line count (13,411,790).
File input plugin.
File rotation is detected and handled by this input,
regardless of whether the file is rotated via a rename or a copy operation.
nifi:
real time nifi pipeline copies logs from nifi server to elk server.
nifi has rolling log files.
logs line count on elk server:
wc -l /mnt/elk/logstash/data/from/nifi/dev/logs/nifi/*.log
13,411,790 total
elasticsearch index docs count:
curl -XGET 'ip:9200/_cat/indices?v&pretty'
docs.count = 20,010,253
logstash input conf file:
cat /mnt/elk/logstash/input_conf_files/test_4.conf
input {
file {
path => "/mnt/elk/logstash/data/from/nifi/dev/logs/nifi/*.log"
type => "test_4"
sincedb_path => "/mnt/elk/logstash/scripts/sincedb/test_4"
}
}
filter {
if [type] == "test_4" {
grok {
match => {
"message" => "%{DATE:date} %{TIME:time} %{WORD:EventType} %{GREEDYDATA:EventText}"
}
}
}
}
output {
if [type] == "test_4" {
elasticsearch {
hosts => "ip:9200"
index => "test_4"
}
}
else {
stdout {
codec => rubydebug
}
}
}