5

According to user comments PostgreSQL data checksums have very minimal runtime overhead (both CPU and storage) but would allow (among other things) using pg_rewind for point in time recovery (PITR). However, data checksums are not enabled by default and enabling it on already existing HA cluster is not possible without pretty significant downtime. (If I've understood correctly, you cannot enable checksums on hot-standby only and promote it as new master once enabling the checksums were complete on the hot-standby.)

Is there some poorly known issues if data checksums were enabled by default? Or is the default state (checksums disabled) just due historical reasons even though enabling data checksums would make much more sense in all cases?

1
  • Interesting question (+1) - I'd say that it's something like "enable as little as possible by default - let your users decide" - i.e. make the system as lean as possible by default - BTW, this is a complete guess on my part! Commented Jun 8, 2023 at 20:35

3 Answers 3

2

Postgresql 18 enables postgres checksums, by default, when creating a new database:

By default, data pages are protected by checksums, but this can optionally be disabled for a cluster. When enabled, each data page includes a checksum that is updated when the page is written and verified each time the page is read. Only data pages are protected by checksums; internal data structures and temporary files are not.

See also a blog post of a company that offers 3rd level postgres support:

With release of PostgreSQL 18, the community decided to turn on data‑checksums by default – a major step toward early detection of these failures.

(credativ.de blog, 2025-11-03)

This indicates that whatever reason there was in the past for having checksums disabled, such as perceived risk, performance implications, overhead - was re-evaluated and resolved, by the development team.

The discussion of a patch that added code for this goes a bit into the previous reasoning:

There was some hesitation years ago when this feature was first added, leading to the current situation where the default is off. However, many years later, there is wide consensus that this is an extraordinarily safe, desirable setting. Indeed, most (if not all) of the major commercial and open source Postgres systems currently turn this on by default.

I think the last time we dicussed this the consensus was that computational overhead of computing the checksums is pretty small for most systems (so the above change seems warranted regardless of whether we switch the default), but turning on wal_compression also turns on wal_log_hints, which can increase WAL by quite a lot.

Depending on your storage stack, the postgresql checksums are a game changer or superfluous. For example, if your postgres db is located on a filesystem that already implements checksumming, such as btrfs or zfs, you don't need postgres checksums. However, performance of postgres on copy-on-write filesystems (such as the mentioned ones) likely suffers and thus something else is used.

Even if you have some kind of super expensive enterprise storage where the vendor promises internal checksumming and redundancy, it's a black box, likely containing tons of complexity, code of unknown quality, with a non-zero bug probability, where it's unknown how much of the product price goes into marketing, sales bonus, the Porsches/Lambos/Yachts of the C-Suite and how much into QA and thorough engineering and development. Thus, additional checksumming on a higher layer can be seen as a defense in depth.

See also:

2
  • 1
    Without checksumming on PostgreSQL level, PostgreSQL cannot do point in time recovery (PITR). As a result, you should have checksumming enabled on PostgreSQL level anyway. Checksumming on filesystem level can only make problems surface faster and maybe allow filesystem level recovery for media errors (e.g. read data again from another disk on RAID setup) without triggering errors on PostgreSQL level. Commented Nov 23 at 8:38
  • I'm marking this answer as the accepted one. Since PostgreSQL changes the default to doing checksums, the older state is just due historical reasons. Commented Nov 23 at 8:40
5

Enabling data checksums is not for free:

  • you have to calculate the checksums frequently, which costs CPU

  • data checksums require that hint bits are WAL logged, which increases the amount of WAL that needs to be written

Also, data checksums don't offer any benefit unless you are using shoddy storage, where data could change between the time you wrote a block and the time you read it again.

If you want to use pg_rewind, you don't need data checksums. It is sufficient to enable the parameter wal_log_hints.

Note that PostgreSQL v18 has nonetheless enabled data checksums by default. People who use reliable hardware can still disable them, and it is a good idea to set default values with consideration for those users who can't be bothered to read the documentation and tune.

2
  • How much would enabling wal_log_hints typically increases the WAL log sizes? Is it closer to +1% or +100%? Commented Jun 9, 2023 at 14:19
  • 1
    The old answer: it depends. Could be virtually zero, but if you load the data, then have a checkpoint, then query the data, it could be 100%, since all the blocks are modified by the first reader, who sets the hint bits. Commented Jun 9, 2023 at 14:27
3

I will quote some opinions from core postgresql developers

That's nice for us but I'm not sure that it's a benefit for users. I've seen little if any data to suggest that checksums actually catch enough problems to justify the extra CPU costs and the risk of false positives.

My problem is more that I'm not confident the checks are mature enough. The basebackup checks are atm not able to detect random data, and neither basebackup nor backend checks detect zeroed out files/file ranges.

I can believe that many users have shared_buffers set to its default value and that we are going to get complains about "performance drop after upgrade to v12" if we switch data checksums to on by default.

data checksums can catch not so many actual data corruptions, are not free (especially with small sizes of shared_buffers - checksums are calculated when writing a page to disk or when reading from disk to shared_buffers) neither by CPU nor by WAL size.

No issues per se, but the general consensus so far is that there's not much benefit for end users to have it enabled by default.

1
  • I would have expected shared_buffers to be increased for all cases where the performance actually matters because it's that critical. Maybe PostgreSQL should automatically adjust shared_buffers by default instead of hardcoding it to really small default? For example, default could be auto and it would be interpreted as 10% of total system RAM. Commented Jun 9, 2023 at 14:23

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.