Most optimal way to backup postgresql database

Question

I have a medium-sized database (postgresql 9.6) with moderate traffic. The database is located on a virtual server, described as having 4cpu cores and 8192mb of ram.

Currently I am backing up the server every hour, using pg_dump on the server. This process can take some time as you'd expect, but the reason for this question is that the process eats up a lot of CPU, meaning that we're regularly seeing degraded performance throughout the day.

Our pg_dump is run like so, to generate a dump for each table individually, as well as a single dump of all tables:

for table in $(psql -d "XXX" -t -c "SELECT table_name FROM information_schema.tables WHERE table_type = 'BASE TABLE' AND table_schema = 'public'");
    do pg_dump -Fc -t $table -d "XXX" > $1/$table.bak;
done;
pg_dump -Fc -d "XXX" > $1/all_tables.bak;

So my question is: how can I optimize the backup process? Ideally, I am looking for the most optimal process in terms of CPU.

I have tried a few things so far, such as trying to offload the dump process to another server but I'm finding limited results...

Any suggestions would be greatly appreciated!

I have to ask, why are you backing up your database every hour? Also why are you backing up every table twice? How about once a day, at night? — Bret Weinraub
– Bret Weinraub, Commented Jun 11, 2020 at 16:05
Why the dump per table? As you are using the custom format, you can restore a single table from that dump just as easily. — user330315
– user330315, Commented Jun 11, 2020 at 16:09
Not that this isn't really considered a "backup" but a dump. If you want to true backup, use WAL archiving or tools like pgBackRest or barman (which will have a lot less impact on the system) — user330315
– user330315, Commented Jun 11, 2020 at 16:10
I agree that dumping every hour is maybe a bit excessive, but the database is constantly changing, so it has been useful to have regular backups in the past to minimize the loss... but yeah I agree that should probably be considered further. As for dump per table, this is because if you try and pg_restore with -t, you don't fully restore the table (constraints, indexes and so on). However, having table-specific dumps allow table-specific restoration. I'll check out the WAL archiving suggestion, thanks @a_horse_with_no_name! — Donny Sutherland
– Donny Sutherland, Commented Jun 11, 2020 at 16:54

jjanes · Accepted Answer · 2020-06-11 18:11:57Z

1

If you want to have backup with an hourly granularity, you should probably use pg_basebackup and WAL archiving (or streaming, with archival from the replica) to create a physical backup, rather than pg_dump to create logical ones. You can then use PITR to restore to almost any time point you want. You will have to take a new basebackup occasionally to keep restore time down, but almost certainly not every hour. Also, pg_basebackup has a low CPU load (apart from compression, but that is done on the local side not the database side if you run pg_basebackup over the network).

answered Jun 11, 2020 at 18:11

jjanes

44.9k5 gold badges39 silver badges48 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Most optimal way to backup postgresql database

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related