What is the right way to sync/import tables from a postgres DB to elasticsearch?

Question

I want to import some tables from a postgres database into Elastic search and also hold the tables in sync with the data in elastic search. I have looked at a course on udemy, and also talked with a colleague who has a lot of experience with this issue to see what the best way to do it is. I am surprised to hear from both of them, it seems like the best way to do it, is to write code in python, java or some other language that handles this import and sync it which brings me to my question. Is this actually the best way to handle this situation? It seems like there would be a library, plugin, or something that would handle the situation of importing data into elastic search and holding it in sync with an external database. What is the best way to handle this situation?

Questions asking us to recommend or find a book, tool, software library, tutorial or other off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it. — Mickael
– Mickael, Commented Mar 22, 2018 at 11:07
I downvote because this question doesn't show any research effort (idownvotedbecau.se/noresearch). I searched on Google "elasticsearch postgresql" and I found in top results that a ElasticSearch JDBC existed. Then I seached "elasticsearch jdbc postgresql" and it gave me the link alkis indicated in his answer. It took me 1 minute... — Mickael
– Mickael, Commented Mar 22, 2018 at 14:55

Alkis Kalogeris · Accepted Answer · 2018-03-22 18:06:27Z

It depends on your use case. A common practice is to handle this on the application layer. Basically what you do is to replicate the actions of one db to the other. So for example if you save one entry in postgres you do the same in elasticsearch.

If you do this however you'll have to have a queuing system in place. Either the queue is integrated on your application layer, e.g. if the save in elasticsearch fails then you can replay the operation. Moreover on your queuing system you'll implement a throttling mechanism in order to not overwhelm elasticsearch. Another approach would be to send events to another app (e.g. logstash etc), so the throttling and persistence will be handled by that system and not your application.

Another approach would be this https://www.elastic.co/blog/logstash-jdbc-input-plugin. You use another system that "polls" your database and sends the changes to elasticsearch. In this case logstash is ideal since it's part of the ELK stack and it has a great integration. Check this too https://www.elastic.co/guide/en/logstash/current/plugins-inputs-jdbc.html

Another approach is to use the NOTIFY mechanism of postgres to send events to some queue that will handle saving the changes in elasticsearch.

Pierre B. · Accepted Answer · 2019-12-10 10:07:00Z

3

There is a more recent tool called "abc", developped by appbase.io It's performance is uncomparable with logstash: - abc is based on go - logstash is jruby

Anybody who's ever used logstash knows that it takes at least 20 seconds just to start.

The same basic table import task from postgresql to elasticsearch takes ~1 min on logstash, and 5 seconds with abc

Pros:

Performance
Performance
Simplicity (no conf)

Cons:

More adapted for one-shot imports, the daemon mode is limited
Less middlewares (logstash filters) as you are required to write a transform.js file that manually changes events

answered Dec 10, 2019 at 10:07

Pierre B.

862 bronze badges

Comments

aclowkay · Accepted Answer · 2018-03-22 11:03:02Z

1

As anything in life,best is subjective. Your colleague likes to write and maintain code to keep this in sync. There's nothing wrong with that.

I would say the best way would be to use some data pipeline. There's plethora of choices, really overwheleming, you can explore the various solutions which support Postgres and ElasticSearch. Here are options I'm familiar with.

Note that these are tools/platform for your solution, not the solution itself. YOU have to configure, customize and enhance them to fit your definition of in sync

answered Mar 22, 2018 at 11:03

aclowkay

3,9356 gold badges41 silver badges72 bronze badges

Collectives™ on Stack Overflow

What is the right way to sync/import tables from a postgres DB to elasticsearch?

3 Answers 3

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related