0

I have a medium-sized database (postgresql 9.6) with moderate traffic. The database is located on a virtual server, described as having 4cpu cores and 8192mb of ram.

Currently I am backing up the server every hour, using pg_dump on the server. This process can take some time as you'd expect, but the reason for this question is that the process eats up a lot of CPU, meaning that we're regularly seeing degraded performance throughout the day.

Our pg_dump is run like so, to generate a dump for each table individually, as well as a single dump of all tables:

for table in $(psql -d "XXX" -t -c "SELECT table_name FROM information_schema.tables WHERE table_type = 'BASE TABLE' AND table_schema = 'public'");
    do pg_dump -Fc -t $table -d "XXX" > $1/$table.bak;
done;
pg_dump -Fc -d "XXX" > $1/all_tables.bak;

So my question is: how can I optimize the backup process? Ideally, I am looking for the most optimal process in terms of CPU.

I have tried a few things so far, such as trying to offload the dump process to another server but I'm finding limited results...

Any suggestions would be greatly appreciated!

4
  • I have to ask, why are you backing up your database every hour? Also why are you backing up every table twice? How about once a day, at night? Commented Jun 11, 2020 at 16:05
  • Why the dump per table? As you are using the custom format, you can restore a single table from that dump just as easily. Commented Jun 11, 2020 at 16:09
  • 1
    Not that this isn't really considered a "backup" but a dump. If you want to true backup, use WAL archiving or tools like pgBackRest or barman (which will have a lot less impact on the system) Commented Jun 11, 2020 at 16:10
  • I agree that dumping every hour is maybe a bit excessive, but the database is constantly changing, so it has been useful to have regular backups in the past to minimize the loss... but yeah I agree that should probably be considered further. As for dump per table, this is because if you try and pg_restore with -t, you don't fully restore the table (constraints, indexes and so on). However, having table-specific dumps allow table-specific restoration. I'll check out the WAL archiving suggestion, thanks @a_horse_with_no_name! Commented Jun 11, 2020 at 16:54

1 Answer 1

1

If you want to have backup with an hourly granularity, you should probably use pg_basebackup and WAL archiving (or streaming, with archival from the replica) to create a physical backup, rather than pg_dump to create logical ones. You can then use PITR to restore to almost any time point you want. You will have to take a new basebackup occasionally to keep restore time down, but almost certainly not every hour. Also, pg_basebackup has a low CPU load (apart from compression, but that is done on the local side not the database side if you run pg_basebackup over the network).

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.