4

I want to set up postgres and elasticsearch. But before throwing data into elasticsearch, I want to prevent data loss when network or server goes down. After reading on this topic: https://gocardless.com/blog/syncing-postgres-to-elasticsearch-lessons-learned/. I came up with 3 solutions.

  1. Create a database table ei: store, and add any new/updated data to it.

    • During queries: insert data into store.
    • Select new data: SELECT data FROM store WHERE modified > (:last modified time from elasticsearch)
    • Send "new" data over to elasticsearch
  2. Use redis to pub/sub requests, and make elasticsearch listen/subscribe for upcoming data. If elasticsearch breaks, the data will be in the queue

  3. Catch any errors during transaction to elasticsearch and save data into a safe place (ei: store table mentioned above). Then have a cron job pushing this data back.


Of course the easiest thing would be to insert data to elasticsearch straight away. But doing so prevents data to be stored in a safe place during corruptions. 1 is too slow in my opinion, unlike 2. And 3 requires mantaining error handling code.

For now 2 is my option.


Are there better ways to do this? I'd like to hear your opinions and new suggestions

:D

4
  • Just curious, how would handle DELETES for case 1. I was just exploring options to do exactly what you are doing... Commented Sep 26, 2016 at 17:38
  • 1
    Also, check this out... qafoo.com/blog/… Commented Sep 26, 2016 at 17:57
  • @FacePalm see anwser Commented Sep 26, 2016 at 19:36
  • heres my new question: stackoverflow.com/questions/39757377/… Commented Sep 29, 2016 at 23:16

1 Answer 1

0

Redis (2) isn't reliable.

What I decided to do add data to elasticsearch straight away and add data to updates table. Then run a sync() function straight after connecting to elasticsearch client (if cluster went down before) + run a cron job every 24 hours to launch sync(). All sync() does is selects newest data (time or id) from updates A and elasticsearch B and compares if there are records A > B. If so, insert data using bulk API.

Hope this helps :)

And I am still opened to suggestions and fedback...

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.