0

We are using Debezium connector for PostgreSQL, with Debezium Version 1.3 and PostgreSQL 10.2, with pgoutput plugin. we are facing peculiar problem when doing a select on PostgreSQL which returns lot of data on the tables (for these tables replication is not enabled) and if the query takes long time, WAL retained size keeps on increasing a lot. once the select statement returns the data, WAL size decrease and become normal after some time.

We already have heartbeat query and heartbeat topic configured in Debezium. that is not helping much. when we check the Debezium logs when select statement is running, we see in logs saying offset is getting committed. so i am thinking Debezium is still committing the LSN which its already consumed.

Today we tried to replicate issue and saw when Select query is running, Debezium lost connectivity with one of the Kafka Broker. not sure if that was coincidence or related, but Debezium tried for long time connecting to broker and was able to to after 10+ mins. until it was able to connect to Kafka broker the LSN was not getting committed and WAL size was increasing.

We tried everything that is available on Debezium documentation and link to other articles who posted issues on Debezium, but still not able to find the root cause of WAL size increase.

Due to above issue, we are not able to run any queries in DB that takes more than 5-10 mins. some times it hung such a way, it will never recover, we have to drop the replication slot and recreate it. and few scenarios, we cant even drop the replication slot. we had to restart the DB to kill the process (we are on AWS RDS, so cant kill the process, login into the box).

CDC is heavily used in the project, upgrading to the latest version of Debezium (2.2) and PostgreSQL (14) is in future plans, but looking for solutions on what we can do now.

Questions,

What is link with select query and WAL size increasing?

When Debezium lost connectivity with one of the broker, doesn't it ignore that broker after few seconds and try to re-balance and continue. why it tried forever (i see it took almost 10 mins to reconnect. during that time it tried to reconnect again and again)

we used heart beat topic, heart beat query, but they didn't help much.

2
  • Have you tried their chat? debezium.zulipchat.com/# Commented May 9, 2023 at 23:03
  • no i didnt know about it, i will try it, thanks for the link Commented May 9, 2023 at 23:12

1 Answer 1

0

PostgreSQL 10.2? Are you kidding?

Upgrade to a reasonable version (in particular, always use the latest minor release of whatever version you are running), and try

ALTER SUBSCRIPTION name SET (streaming = on);
Sign up to request clarification or add additional context in comments.

8 Comments

Well as i said , we are planning to upgrade by mid Sep, but does that ALTER statement work on version which i am using?
No, it doesn't, and I cannot even be sure that this is what fixes your problem. But it would be worth a try.
understood, do u see any relation between running a select statement and WAL size keeps increasing .. sorry to ask same question again, i could not find any documentation about it.
A long statement is a long transaction, and I suspect that is the connection.
ok, now hopefully last question, ALTER SUBSCRIPTION name SET (streaming = on); do you know which version of postgres that statement support? in Dev environment we are on Postgres 12.11, and when i query pg_subscription, i dont see any streaming column over there, i am not sure whether i am checking correctly or does that table will have that column. but how can i check whether alter worked after running the statement. i have to provide this query to op's to run and see whether the statement ran successfully.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.