Backup

Logical Backup

Authors: Egor Rogov, Pavel Luzanov, Ilya Bashtanov, Igor Gnatyuk

Translated by: Liudmila Mantrova, Alexander Meleshko, Elena Sharafutdinova

Photo by: Oleg Bartunov (Phu monastery, Bhrikuti summit, Nepal)

Use of Course Materials

Non-commercial use of course materials (presentations, demonstrations) is

allowed without restrictions. Commercial use is possible only with the written

permission of Postgres Professional. It is prohibited to make changes to the

course materials.

Feedback

Please send your feedback, comments and suggestions to:

edu@postgrespro.ru

Disclaimer

Postgres Professional assumes no responsibility for any damages and losses,

including loss of income, caused by direct or indirect, intentional or accidental

use of course materials. Postgres Professional company specifically disclaims

any warranties on course materials. Course materials are provided “as is,” and

Postgres Professional company has no obligations to provide maintenance,

support, updates, enhancements, or modifications.

Topics

Logical and Physical Backup

Backup and Restore of a Table

Backup and Restore of a Database

Backup and Restore of the Whole Cluster

Logical Backup

SQL commands to create objects and fill them with data

+ backup of a particular object or a database

+ restore to a different architecture or PostgreSQL version

+ (binary compatibility is not required)

+ ease of use

− modest operation speed

− no point-in-time recovery

A logical backup is a set of SQL commands that can restore the database

cluster (or a particular database or table) from scratch: it creates all the

required objects and fills them with data.

These commands can be run on a different server version (if it provides

compatibility at the command level) or on a different platform or architecture

(binary compatibility is not required).

In particular, a logical backup can be used for long-term storage: you can

restore it even after upgrading the server to a higher version.

The process of creating a logical backup is relatively easy. It is usually

enough to run a single command or launch a single utility.

But for large databases, the execution of these commands can take a very

long time. Using a logical backup, you can restore your database system

only to its state exactly to the point in time when the backup process was

initiated.

https://postgrespro.com/docs/postgresql/16/backup-dump

Physical Backup

A copy of the database cluster’s file system

+ faster than logical backup

+ statistics are restored

− restore is only possible on a compatible system, with the same PostgreSQL

major version installed

− partial backup is impossible, the whole cluster is copied

WAL archive

+ point-in-time recovery is available

A physical backup implies creating a copy of all files related to the database

cluster: in other words, creating its full binary copy.

It is faster to copy files than dump SQL commands; besides, unlike restoring

a logical backup, starting a server using a physical copy is a matter of

several minutes. Another advantage is that you do not have to rebuild

statistical data: it is also restored from the physical copy.

But this approach has its own shortcomings. A physical backup can be used

to restore the system only on a compatible platform (that has the same OS,

architecture, etc.) with the same major PostgreSQL version installed.

Besides, it is impossible to create a physical copy of a specific database:

you can only back up the whole database cluster.

Physical backups are usually used together with WAL archives. It enables

system recovery not only at the time of backup creation, but also at an

arbitrary point in time.

https://postgrespro.com/docs/postgresql/16/backup-file

https://postgrespro.com/docs/postgresql/16/continuous-archiving

Creating physical backups for any important production systems is a

common practice. It is the responsibility of a DBA to take such backups.

Making a Table Copy in SQL

YKS Yakutsk (129.770

MJZ Mirny (114.039

KHV Khabarovsk (1

PKC Yelizovo (158.453

UUS Khomutovo (142.718

VVO Vladivostok (1

LED Pulkovo (30.2625

KGD Khrabrovo (20.5925

KEJ Kemerovo (86.1072

CEK Chelyabinsk (61.5033

MQF Magnitogorsk (5

PEE Perm (02/56-11

SGC Surgut (73.4018

=# COPY table TO 'file';

=# COPY table FROM 'file';

the file is located in the server file system and can be accessed by the owner of the

PostgreSQL instance

you can specify the columns to copy (or use an arbitrary query)

restored rows are added to already existing ones

the format

is adjustable

If you want to save only the contents of one table, you can use the COPY

command.

The COPY TO flavor of this command enables you to save the table (or

some of its columns, or even the result of an arbitrary query) into a file,

display it in the terminal, or provide it as input to an application. You can

specify parameters such as format (plain text, CSV or binary), field

separators, NULL representation, and more.

The alternative variant of the COPY command reads fields from a file or the

console and inserts them into a table. The table isn't cleared, the new rows

are simply appended to the existing ones.

The COPY command is significantly faster than similar INSERT commands,

because the client does not need to access the server repeatedly, and the

server does not have to analyze the commands multiple times.

Notably, the COPY FROM command ignores the defined rules, although

integrity constraints and triggers are respected.

https://postgrespro.com/docs/postgresql/16/sql-copy

Making a Table Copy in psql

YKS Yakutsk (129.770

MJZ Mirny (114.039

KHV Khabarovsk (1

PKC Yelizovo (158.453

UUS Khomutovo (142.718

VVO Vladivostok (1

LED Pulkovo (30.2625

KGD Khrabrovo (20.5925

KEJ Kemerovo (86.1072

CEK Chelyabinsk (61.5033

MQF Magnitogorsk (5

PEE Perm (02/56-11

SGC Surgut (73.4018

=# \copy table to 'file'

=# \copy table from 'file'

the file is located in the client file system and can be accessed by the OS user who

has started psql

the data is transferred between the client and the server

the syntax and the supported features are analogous to those provided by COPY

In psql, there is a client version of the COPY command with a similar syntax.

The file name in the SQL command corresponds to a file on the database

server. The user running PostgreSQL (usually postgres) must have access

to this file.

The client implementation of this command refers to the file located on the

client, which allows keeping a local copy of data even if there is no access to

the server file system. The table contents is automatically transferred

between the client and the server.

https://postgrespro.com/docs/postgresql/16/app-psql

Database Backup

-- PostgreSQL database dum

-- Dumped from database ve

-- Dumped by pg_dump versi

SET statement_timeout = 0;

SET lock_timeout = 0;

SET

idle_in_transaction_sessio

SET client_encoding = 'UTF

SET standard_conforming_st

$ pg_dump -d base -f file

$ psql -f file

format: SQL commands

you can specify the database objects to be dumped

the new database must be created from the template0 template

roles and tablespaces must be created in advance

it makes sense to perform ANALYZE after restoring

The pg_dump utility creates a full-scale database backup.

If you omit the file name (-f, --file), the utility’s output will be displayed in the

terminal. The produced output is a script to be run in psql; it contains the

commands that will create the required objects and fill them with data.

You can use optional parameters to limit the set of backed up objects: for

example, you can choose to back up only particular tables, objects in

particular schemas, or use other filters.

To restore the objects from the backup, just run the generated script in psql.

Note that the database to be restored should be cloned from template0,

since all the changes made to template1 will also make it into the backup.

Besides, all the required roles and tablespaces must be set up in advance.

Since these objects do not belong to any particular database, they won’t be

included into the dump.

Once the database is restored, it makes sense to run the ANALYZE

command: it will collect statistics that the optimizer requires for query

planning.

https://postgrespro.com/docs/postgresql/16/app-pgdump

The custom Format

;

; Archive created at 2017-

; dbname: demo

; TOC Entries: 146

; Compression: -1

; Dump Version 1.12-0

; Format CUSTOM

;

; Selected TOC Entries:

;

2297; 1262 475453 DATABASE

5; 2615 475454 SCHEMA - bo

2298; 0 0 COMMENT - SCHEMA

$ pg_dump -d database -F c -f file

$ pg_restore -d database -j N file

an internal format with a table of contents (TOC)

database objects to be restored can be selected at the time of restore

restoring can be performed in parallel mode

The pg_dump utility allows you to specify the backup format. By default, the

plain format is used; it provides pure psql commands.

The custom format (-F c, --format=custom) creates a backup in a special

format that contains not only the backed up objects, but also a table of

contents (TOC). Having a TOC allows you to choose the objects right at the

time of restore, not while making the dump.

By default, the output of the custom format is compressed.

To restore the database, you need to run another utility: pg_restore. It reads

the file and converts it to psql commands. If you do not explicitly provide the

database name (with the -d option), all commands will be output to the

terminal. If the database is specified, pg_restore will connect to this

database and execute the commands; you won’t have to start psql.

To restore only some of the objects, you can use one of the following

approaches. The first one is to filter the objects to be restored, just like it is

done in pg_dump. In fact, pg_restore shares many parameters with

pg_dump.

The second option is to use the TOC to retrieve the list of objects included

into the backup (via the --list option). Then you can edit this list manually:

delete the objects you do not need and pass the modified list to pg_restore

(via the --use-list option).

https://postgrespro.com/docs/postgresql/16/app-pgrestore.html

The directory Format

$ pg_dump -d database -F d -j N -f directory

$ pg_restore -d database -j N directory

the directory contains a separate file for each database object and a TOC

database objects to be restored can be selected at the time of restoration

both dump and restore operations can be performed in parallel mode

consistency is

ensured by shared

data snapshot

You can also create backups in the directory format. In this case, pg_dump

produces a whole directory instead of a single file; it contains the backed up

objects and the table of contents. By default, all files in the directory are

compressed.

Its advantage over the custom format is that such a backup can be created

concurrently using several processes (the number of processes is specified

in option -j, --jobs).

Naturally, the backup will contain consistent data even though it has been

created concurrently. Consistency is ensured by using a single data

snapshot for all the parallel processes.

https://postgrespro.com/docs/postgresql/16/functions-admin.html#FUNCTIO

NS-SNAPSHOT-SYNCHRONIZATION

Data restoration can also be performed in parallel mode (it is also supported

for the custom format).

Other capabilities are quite similar to those provided by the previously

discussed formats: the directory format supports the same options and

approaches.

Format Comparison

plain custom directory tar

restoration

utility

psql pg_restore

compression zlib

partial restore yes yes yes

parallel backup yes

parallel restore yes yes

This slide compares different features provided by different backup formats.

Note that there is also one more format available: tar. We do not cover it

here as it does not bring anything new and has no advantages as compared

to other formats. In fact, this format is simply a version of the directory

format in a tar archive, but it does not support compression or parallel

execution.

Cluster Backup

-- PostgreSQL database clu

SET default_transaction_re

SET client_encoding = 'UTF

SET standard_conforming_st

-- Roles

CREATE ROLE postgres;

$ pg_dumpall -f file

$ psql -f file

format: SQL commands

dumps the whole cluster, including roles and tablespaces

the user must have access to all objects in the database cluster

parallel backups are not supported

pg_dumpall creates a backup of the entire cluster, including roles and

tablespaces.

Since pg_dumpall requires access to all objects of all databases, it should

be run by a superuser or a user with the pre-defined role pg_read_all_data.

pg_dumpall connects to each database in the cluster one by one and

backups them using pg_dump. In addition, it also stores data related to the

cluster as a whole.

To start this process, pg_dumpall has to establish a connection with any

available database. By default, postgres or template1 is selected, but you

can specify another one.

The result of pg_dumpall is a script for psql. Other formats are not

supported. This means that pg_dumpall does not support parallel execution,

which can be a problem for larger clusters. In this case, you can use the

--globals-only option to dump only roles and tablespaces, and dump the

databases separately using pg_dump in parallel mode.

https://postgrespro.com/docs/postgresql/16/app-pg-dumpall

Takeaways

You can make a logical backup of a whole cluster, a specific

database or specific database objects

Good for

small amounts of data

long-term storage during which the server can been upgraded

migration to a different platform

Not so good for

crash recovery with minimal data loss

Practice

1. As an employee, back up the bookstore database in the custom

format.

“Accidentally” empty the authorship table. Check that the

application has stopped displaying book titles in Bookstore,

Books, and Catalog tabs.

Use the backup to restore the lost data.

Verify that normal operation of the bookstore is restored.

1. Include the user into the pre-defined role pg_read_all_data. Use the

--data-only option for the restore operation, as an attempt to create the table

will result in an error.

Practice+

1. The psql \copy command lets you pass the output as input to an

arbitrary application. Use this capability to open the result of

some query in LibreOffice Calc.

1. The command must save the result into a file and then start

libreoffice with this file passed as a parameter. The file must be saved

in the CSV format.

Naturally, this approach is platform-dependent and will require modifications,

say, on Windows.