1

Aim: sync elasticsearch with postgres database
Why: sometimes newtwork or cluster/server break so future updates should be recorded

This article https://qafoo.com/blog/086_how_to_synchronize_a_database_with_elastic_search.html suggests that I should create a separate table updates that will sync elasticsearch's id, allowing to select new data (from database) since the last record (in elasticsearch). So I thought what if I could record elasticsearch's failure and successful connection: if client ponged back successfully (returned a promise), I could launch a function to sync records with my database.

Here's my elasticConnect.js

import elasticsearch from 'elasticsearch'
import syncProcess from './sync'

const client = new elasticsearch.Client({
  host:  'localhost:9200',
  log: 'trace'
});


client.ping({
   requestTimeout: Infinity,
   hello: "elasticsearch!"
})
.then(() => syncProcess) // successful connection 
.catch(err => console.error(err))


 export default client

This way, I don't even need to worry about running cron job (if question 1 is correct), since I know that cluster is running.

Questions

  1. Will syncProcess run before export default client? I don't want any requests coming in while syncing...

  2. syncProcess should run only once (since it's cached/not exported), no matter how many times I import elasticConnect.js. Correct?

  3. Is there any advantages using the method with updates table, instead of just selecting data from parent/source table?

  4. The articles' comments say "don't use timestamp to compare new data!".Ehhh... why? It should be ok since database is blocking, right?

1 Answer 1

2

For 1: As it is you have not warranty that syncProcess will have run by the time the client is exported. Instead you should do something like in this answer and export a promise instead.

For 2: With the solution I linked to in the above question, this would be taken care of.

For 3: An updates table would also catch record deletions, while simply selecting from the DB would not, since you don't know which records have disappeared.

For 4: The second comment after the article you linked to provides the answer (hint: timestamps are not strictly monotonic).

Sign up to request clarification or add additional context in comments.

1 Comment

Question: in the article they only mentioned "in order to sync, run cron job". But this can't be right, since last_squence_id from elasticsearch will be changed, therefore missing out old updates. So I need to make sure that I sync before inserting data into elasticsearch, correct?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.