0

Description

The following table lists 2 DB servers using PostgreSQL streaming replication for synchronization:

Server Role Method Mode
db01 Primary
db02 Standby Streaming replication async

db02 is configured as a failover but also used as a reporting DB, meaning SQL queries run on db02 while it is kept up-to-date with db01 using standard asynchronous streaming replication.

Issue

It seems that the Standby (db02) can actually slow down the Primary (db01) when the Standby (db02) falls behind and needs to catch up after a long running query (e.g. report query) on the Standby (db02) prevented it from staying up-to-date with the Primary (db01).

Breaking it down into steps:

  1. Long running SQL query on the Standby (db02) prevents it from staying up-to-date with the Primary (db01).
  2. SQL query completes (or is killed after reaching timeout) on the Standby (db02).
  3. The Primary (db01) then gives priority to the Standby (db02) for catching up. During this time performance is impacted as the Primary (db01) is processing less queries from the application while the Standby (db02) is catching up.

Question

Is it expected that a Primary DB could be impacted by a Standby DB falling behind even when using standard streaming replication which is asynchronous by default ?

Configuration

Here's the configuration for both servers

db01 (Primary)
listen_addresses = '*'
port = 5432
data_directory = '/data/postgres/14/pg_data'
shared_buffers = 6GB
effective_cache_size = 12GB
archive_mode = on
archive_timeout = 15min
archive_command = '/usr/bin/pgbackrest --stanza=prod archive-push %p'
autovacuum = on
autovacuum_vacuum_scale_factor = 0.1
autovacuum_vacuum_threshold = 50
checkpoint_completion_target = 0.9
default_statistics_target = 100
huge_pages = on
logging_collector = on
log_autovacuum_min_duration = 60s
log_directory = pg_log
log_filename = 'postgresql-%Y-%m-%d.log'
log_line_prefix = '%t [%p]: user=%u,db=%d,app=%a,client=%h'
log_lock_waits = on
log_min_messages = warning
log_rotation_age = 0
log_rotation_size = 1GB
max_locks_per_transaction = 512
max_wal_senders = 10
max_wal_size = 4GB
min_wal_size = 2GB
password_encryption = 'scram-sha-256'
ssl = on
ssl_ciphers = 'HIGH:+3DES:!aNULL'
ssl_min_protocol_version = 'TLSv1.2'
ssl_prefer_server_ciphers = on
superuser_reserved_connections = 3
synchronous_commit = on
track_counts = on
track_activity_query_size = 8192
wal_buffers = '-1'
wal_compression = off
wal_keep_size = 1600MB
wal_level = replica
work_mem = 64MB
fsync = on
autovacuum_max_workers = 5
autovacuum_work_mem = 256MB
checkpoint_timeout = 600s
maintenance_work_mem = 520MB
max_connections = 90
db02 (Standby)
listen_addresses = '*'
port = 5432
data_directory = '/data/postgres/14/pg_data'
shared_buffers = 6GB
effective_cache_size = 12GB
archive_mode = on
archive_timeout = 15min
archive_command = '/bin/true'
autovacuum = on
autovacuum_vacuum_scale_factor = 0.1
autovacuum_vacuum_threshold = 50
checkpoint_completion_target = 0.9
default_statistics_target = 100
huge_pages = on
logging_collector = on
log_autovacuum_min_duration = 60s
log_directory = pg_log
log_filename = 'postgresql-%Y-%m-%d.log'
log_line_prefix = '%t [%p]: user=%u,db=%d,app=%a,client=%h'
log_lock_waits = on
log_min_messages = warning
log_rotation_age = 0
log_rotation_size = 1GB
max_locks_per_transaction = 512
max_wal_senders = 10
max_wal_size = 4GB
min_wal_size = 2GB
password_encryption = 'scram-sha-256'
ssl = on
ssl_ciphers = 'HIGH:+3DES:!aNULL'
ssl_min_protocol_version = 'TLSv1.2'
ssl_prefer_server_ciphers = on
superuser_reserved_connections = 3
synchronous_commit = on
track_counts = on
track_activity_query_size = 8192
wal_buffers = '-1'
wal_compression = off
wal_keep_size = 1600MB
wal_level = replica
work_mem = 64MB
fsync = on
autovacuum_max_workers = 5
autovacuum_work_mem = 256MB
checkpoint_timeout = 600s
maintenance_work_mem = 520MB
max_connections = 90
max_standby_archive_delay = 900s
max_standby_streaming_delay = 900s
primary_conninfo = 'host=db01 port=5432 user=repuser'
7
  • If the standby is falling behind, the primary has to buffer the replication data. Whether that'll noticeably slow it down, or whether that is what is happening in your case, we cannot tell with the information given in your question. Commented May 6 at 11:27
  • Are both instances running on the same hardware? (including storage/SAN) Commented May 6 at 14:24
  • Have you confirmed that synchronous_standby_names is empty on db01? Commented May 6 at 15:36
  • @AdrianKlaver This has been checked and no standby names are configured. Commented May 7 at 16:02
  • @FrankHeikens I've currently requested the unit handling the infrastructure to hand us the servers layout so that we can assess whether any hardware is shared among the instances. Commented May 7 at 16:06

1 Answer 1

0

Unless you have hot_standby_feedback = on on the standby, activity on the standby cannot influence performance on the primary.

Sign up to request clarification or add additional context in comments.

8 Comments

Precisely, this is what I understand from the documentation. However, the performance issue stopped instantly once the "streaming" part of the replication was removed (primary_conninfo removed from postgresql.auto.conf) from the Standby. As hot_standby_feedback is not defined on the Standby I don't understand what caused the issue.
You should analyze the performance issue. Since you said that queries are run on the standby, maybe that causes delays?
The Standby has lower cpu and memory than the Primary in this case, but could this impact the Primary ? The queries run on the Standby are for reporting software (performance not an issue here), the issue is on the Primary slowing down once the Standby tries to catch up because it doesn't handle queries from the main application fast enough during that time. As hot_standby_feedback is not defined I don't understand how there could be any impact between the Standby and the Primary. The performance issue on the Primary stopped as soon as the streaming replication was removed on the Standby.
Well, set tracke_io_timing = on and EXPLAIN (ANALYZE, BUFFERS) the slow queries to get a clue. Perhaps both servers share the same storage system, and that's how the standby can influence the primary.
Thank you for the tip, I will try this and post the findings when the situation presents itself again.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.