0

I'm trying to prevent duplicate entries into elasticsearch, but based on one of the parameters in the data. For example, if I have an object like below, I'd like to prevent another entry that has the same array of event_ids. Any ideas how to do this?

I'm coding this using the elasticsearch javascript API if it makes any difference.

{ start_date: '2015-11-19T08:46:14-05:00',
end_date: '2015-11-19T08:46:38-05:00',
length_seconds: 24,
number_events: 5,
event_ids: [ 5589253, 5589254, 5589255, 5589256, 5510380 ] },

1 Answer 1

2

You can make a string with all the event ids concatenated, hash it, and then use that as the id of the document. You should also sort and remove duplicates.

So for instance: sha1("5510380|5589253|5589254|5589255|5589256")

That way a document with the same events will get the same _id. Then you can check whether that document exists or not.

Sign up to request clarification or add additional context in comments.

2 Comments

Any ideas for something that uses less computational resources than a SHA hash?
Any hash would do, it is only used to shorten the string. For instance a fast one is murmurhash3. If the list isn't very long, then the raw string could also work.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.