Backup
Logical Backup
16
Copyright
© Postgres Professional, 2017–2025
Authors: Egor Rogov, Pavel Luzanov, Ilya Bashtanov, Igor Gnatyuk
Translated by: Liudmila Mantrova, Alexander Meleshko, Elena Sharafutdinova
Photo by: Oleg Bartunov (Phu monastery, Bhrikuti summit, Nepal)
Use of Course Materials
Non-commercial use of course materials (presentations, demonstrations) is
allowed without restrictions. Commercial use is possible only with the written
permission of Postgres Professional. It is prohibited to make changes to the
course materials.
Feedback
Please send your feedback, comments and suggestions to:
edu@postgrespro.ru
Disclaimer
Postgres Professional assumes no responsibility for any damages and losses,
including loss of income, caused by direct or indirect, intentional or accidental
use of course materials. Postgres Professional company specifically disclaims
any warranties on course materials. Course materials are provided “as is,” and
Postgres Professional company has no obligations to provide maintenance,
support, updates, enhancements, or modifications.
2
Topics
Logical and Physical Backup
Backup and Restore of a Table
Backup and Restore of a Database
Backup and Restore of the Whole Cluster
3
Logical Backup
SQL commands to create objects and fill them with data
+ backup of a particular object or a database
+ restore to a different architecture or PostgreSQL version
+ (binary compatibility is not required)
+ ease of use
− modest operation speed
− no point-in-time recovery
A logical backup is a set of SQL commands that can restore the database
cluster (or a particular database or table) from scratch: it creates all the
required objects and fills them with data.
These commands can be run on a different server version (if it provides
compatibility at the command level) or on a different platform or architecture
(binary compatibility is not required).
In particular, a logical backup can be used for long-term storage: you can
restore it even after upgrading the server to a higher version.
The process of creating a logical backup is relatively easy. It is usually
enough to run a single command or launch a single utility.
But for large databases, the execution of these commands can take a very
long time. Using a logical backup, you can restore your database system
only to its state exactly to the point in time when the backup process was
initiated.
4
Physical Backup
A copy of the database clusters file system
+ faster than logical backup
+ statistics are restored
− restore is only possible on a compatible system, with the same PostgreSQL
major version installed
− partial backup is impossible, the whole cluster is copied
WAL archive
+ point-in-time recovery is available
A physical backup implies creating a copy of all files related to the database
cluster: in other words, creating its full binary copy.
It is faster to copy files than dump SQL commands; besides, unlike restoring
a logical backup, starting a server using a physical copy is a matter of
several minutes. Another advantage is that you do not have to rebuild
statistical data: it is also restored from the physical copy.
But this approach has its own shortcomings. A physical backup can be used
to restore the system only on a compatible platform (that has the same OS,
architecture, etc.) with the same major PostgreSQL version installed.
Besides, it is impossible to create a physical copy of a specific database:
you can only back up the whole database cluster.
Physical backups are usually used together with WAL archives. It enables
system recovery not only at the time of backup creation, but also at an
arbitrary point in time.
Creating physical backups for any important production systems is a
common practice. It is the responsibility of a DBA to take such backups.
5
Making a Table Copy in SQL
YKS Yakutsk (129.770
MJZ Mirny (114.039
KHV Khabarovsk (1
PKC Yelizovo (158.453
UUS Khomutovo (142.718
VVO Vladivostok (1
LED Pulkovo (30.2625
KGD Khrabrovo (20.5925
KEJ Kemerovo (86.1072
CEK Chelyabinsk (61.5033
MQF Magnitogorsk (5
PEE Perm (02/56-11
SGC Surgut (73.4018
=# COPY table TO 'file';
=# COPY table FROM 'file';
the file is located in the server file system and can be accessed by the owner of the
PostgreSQL instance
you can specify the columns to copy (or use an arbitrary query)
restored rows are added to already existing ones
the format
is adjustable
If you want to save only the contents of one table, you can use the COPY
command.
The COPY TO flavor of this command enables you to save the table (or
some of its columns, or even the result of an arbitrary query) into a file,
display it in the terminal, or provide it as input to an application. You can
specify parameters such as format (plain text, CSV or binary), field
separators, NULL representation, and more.
The alternative variant of the COPY command reads fields from a file or the
console and inserts them into a table. The table isn't cleared, the new rows
are simply appended to the existing ones.
The COPY command is significantly faster than similar INSERT commands,
because the client does not need to access the server repeatedly, and the
server does not have to analyze the commands multiple times.
Notably, the COPY FROM command ignores the defined rules, although
integrity constraints and triggers are respected.
6
Making a Table Copy in psql
YKS Yakutsk (129.770
MJZ Mirny (114.039
KHV Khabarovsk (1
PKC Yelizovo (158.453
UUS Khomutovo (142.718
VVO Vladivostok (1
LED Pulkovo (30.2625
KGD Khrabrovo (20.5925
KEJ Kemerovo (86.1072
CEK Chelyabinsk (61.5033
MQF Magnitogorsk (5
PEE Perm (02/56-11
SGC Surgut (73.4018
=# \copy table to 'file'
=# \copy table from 'file'
the file is located in the client file system and can be accessed by the OS user who
has started psql
the data is transferred between the client and the server
the syntax and the supported features are analogous to those provided by COPY
In psql, there is a client version of the COPY command with a similar syntax.
The file name in the SQL command corresponds to a file on the database
server. The user running PostgreSQL (usually postgres) must have access
to this file.
The client implementation of this command refers to the file located on the
client, which allows keeping a local copy of data even if there is no access to
the server file system. The table contents is automatically transferred
between the client and the server.
8
Database Backup
--
-- PostgreSQL database dum
--
-- Dumped from database ve
-- Dumped by pg_dump versi
SET statement_timeout = 0;
SET lock_timeout = 0;
SET
idle_in_transaction_sessio
SET client_encoding = 'UTF
SET standard_conforming_st
$ pg_dump -d base -f file
$ psql -f file
format: SQL commands
you can specify the database objects to be dumped
the new database must be created from the template0 template
roles and tablespaces must be created in advance
it makes sense to perform ANALYZE after restoring
The pg_dump utility creates a full-scale database backup.
If you omit the file name (-f, --file), the utility’s output will be displayed in the
terminal. The produced output is a script to be run in psql; it contains the
commands that will create the required objects and fill them with data.
You can use optional parameters to limit the set of backed up objects: for
example, you can choose to back up only particular tables, objects in
particular schemas, or use other filters.
To restore the objects from the backup, just run the generated script in psql.
Note that the database to be restored should be cloned from template0,
since all the changes made to template1 will also make it into the backup.
Besides, all the required roles and tablespaces must be set up in advance.
Since these objects do not belong to any particular database, they won’t be
included into the dump.
Once the database is restored, it makes sense to run the ANALYZE
command: it will collect statistics that the optimizer requires for query
planning.
9
The custom Format
;
; Archive created at 2017-
; dbname: demo
; TOC Entries: 146
; Compression: -1
; Dump Version 1.12-0
; Format CUSTOM
;
; Selected TOC Entries:
;
2297; 1262 475453 DATABASE
5; 2615 475454 SCHEMA - bo
2298; 0 0 COMMENT - SCHEMA
$ pg_dump -d database -F c -f file
$ pg_restore -d database -j N file
an internal format with a table of contents (TOC)
database objects to be restored can be selected at the time of restore
restoring can be performed in parallel mode
The pg_dump utility allows you to specify the backup format. By default, the
plain format is used; it provides pure psql commands.
The custom format (-F c, --format=custom) creates a backup in a special
format that contains not only the backed up objects, but also a table of
contents (TOC). Having a TOC allows you to choose the objects right at the
time of restore, not while making the dump.
By default, the output of the custom format is compressed.
To restore the database, you need to run another utility: pg_restore. It reads
the file and converts it to psql commands. If you do not explicitly provide the
database name (with the -d option), all commands will be output to the
terminal. If the database is specified, pg_restore will connect to this
database and execute the commands; you won’t have to start psql.
To restore only some of the objects, you can use one of the following
approaches. The first one is to filter the objects to be restored, just like it is
done in pg_dump. In fact, pg_restore shares many parameters with
pg_dump.
The second option is to use the TOC to retrieve the list of objects included
into the backup (via the --list option). Then you can edit this list manually:
delete the objects you do not need and pass the modified list to pg_restore
(via the --use-list option).
10
The directory Format
$ pg_dump -d database -F d -j N -f directory
$ pg_restore -d database -j N directory
the directory contains a separate file for each database object and a TOC
database objects to be restored can be selected at the time of restoration
both dump and restore operations can be performed in parallel mode
consistency is
ensured by shared
data snapshot
You can also create backups in the directory format. In this case, pg_dump
produces a whole directory instead of a single file; it contains the backed up
objects and the table of contents. By default, all files in the directory are
compressed.
Its advantage over the custom format is that such a backup can be created
concurrently using several processes (the number of processes is specified
in option -j, --jobs).
Naturally, the backup will contain consistent data even though it has been
created concurrently. Consistency is ensured by using a single data
snapshot for all the parallel processes.
Data restoration can also be performed in parallel mode (it is also supported
for the custom format).
Other capabilities are quite similar to those provided by the previously
discussed formats: the directory format supports the same options and
approaches.
11
Format Comparison
plain custom directory tar
restoration
utility
psql pg_restore
compression zlib
partial restore yes yes yes
parallel backup yes
parallel restore yes yes
This slide compares different features provided by different backup formats.
Note that there is also one more format available: tar. We do not cover it
here as it does not bring anything new and has no advantages as compared
to other formats. In fact, this format is simply a version of the directory
format in a tar archive, but it does not support compression or parallel
execution.
13
Cluster Backup
--
-- PostgreSQL database clu
--
SET default_transaction_re
SET client_encoding = 'UTF
SET standard_conforming_st
--
-- Roles
--
CREATE ROLE postgres;
$ pg_dumpall -f file
$ psql -f file
format: SQL commands
dumps the whole cluster, including roles and tablespaces
the user must have access to all objects in the database cluster
parallel backups are not supported
pg_dumpall creates a backup of the entire cluster, including roles and
tablespaces.
Since pg_dumpall requires access to all objects of all databases, it should
be run by a superuser or a user with the pre-defined role pg_read_all_data.
pg_dumpall connects to each database in the cluster one by one and
backups them using pg_dump. In addition, it also stores data related to the
cluster as a whole.
To start this process, pg_dumpall has to establish a connection with any
available database. By default, postgres or template1 is selected, but you
can specify another one.
The result of pg_dumpall is a script for psql. Other formats are not
supported. This means that pg_dumpall does not support parallel execution,
which can be a problem for larger clusters. In this case, you can use the
--globals-only option to dump only roles and tablespaces, and dump the
databases separately using pg_dump in parallel mode.
15
Takeaways
You can make a logical backup of a whole cluster, a specific
database or specific database objects
Good for
small amounts of data
long-term storage during which the server can been upgraded
migration to a different platform
Not so good for
crash recovery with minimal data loss
16
Practice
1. As an employee, back up the bookstore database in the custom
format.
“Accidentally” empty the authorship table. Check that the
application has stopped displaying book titles in Bookstore,
Books, and Catalog tabs.
Use the backup to restore the lost data.
Verify that normal operation of the bookstore is restored.
1. Include the user into the pre-defined role pg_read_all_data. Use the
--data-only option for the restore operation, as an attempt to create the table
will result in an error.
17
Practice+
1. The psql \copy command lets you pass the output as input to an
arbitrary application. Use this capability to open the result of
some query in LibreOffice Calc.
1. The command must save the result into a file and then start
libreoffice with this file passed as a parameter. The file must be saved
in the CSV format.
Naturally, this approach is platform-dependent and will require modifications,
say, on Windows.