Create Document in Elasticsearch Without Duplicate Parameters

Question

I'm trying to prevent duplicate entries into elasticsearch, but based on one of the parameters in the data. For example, if I have an object like below, I'd like to prevent another entry that has the same array of event_ids. Any ideas how to do this?

I'm coding this using the elasticsearch javascript API if it makes any difference.

{ start_date: '2015-11-19T08:46:14-05:00',
end_date: '2015-11-19T08:46:38-05:00',
length_seconds: 24,
number_events: 5,
event_ids: [ 5589253, 5589254, 5589255, 5589256, 5510380 ] },

Henrik Nordvik · Accepted Answer · 2016-01-09 00:10:46Z

2

You can make a string with all the event ids concatenated, hash it, and then use that as the id of the document. You should also sort and remove duplicates.

So for instance: sha1("5510380|5589253|5589254|5589255|5589256")

That way a document with the same events will get the same _id. Then you can check whether that document exists or not.

answered Jan 9, 2016 at 0:10

Henrik Nordvik

3303 silver badges10 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Rob Over a year ago

Any ideas for something that uses less computational resources than a SHA hash?

Henrik Nordvik Over a year ago

Any hash would do, it is only used to shorten the string. For instance a fast one is murmurhash3. If the list isn't very long, then the raw string could also work.

Collectives™ on Stack Overflow

Create Document in Elasticsearch Without Duplicate Parameters

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related