Partitioning of related tables in PostgreSQL

Question

I've checked documentation and saw some presentations, read blogs, but can't find examples of partitioning of more than a single table in PostgreSQL - and that's what we need. Our tables are insert only audit trail with master-detail structure and we aim to solve our problem with slow data removal problem, currently done using delete.

The simplified structure and some queries are shown in the following fiddle: https://www.db-fiddle.com/f/2mRXT4wGjM2ZSftjgKyZce/46

The issue I'm investigating right now is how to effectively query the detail table, be it in JOIN or directly. Because the timestamp field is part of the partition key I understand that using it in query is essential. I don't understand why JOIN is not able to figure this out when timestamp equality is used in ON clause (couple of explain examples are in the fiddle).

Then there are broader questions:

What is general recommended strategy for similar case? We expect that timestamp is essential for our query, so it feels natural to use it as partitioning key.
I've made a short experiment (so no real experiences from it yet) and based the partitioning solely on id range. This seems to have one advantage - predictable partition table sizes (more or less, depending on the size of variable columns, of course). It is possible to add check timestamp ... conditions on any full partition (and open interval check on active one too!) which helps with partition pruning. This has nice benefit that detail table needs single column FK referencing only master.id (perhaps even pruning better during JOINs). Any ideas or experiences with something similar?

We would rather have time-based partitioning, seems more natural, but it's not a hard condition. The need of dragging timestamp to another table and to its FK, etc. makes it less compelling.

Obviously, we want both tables (all, to be precise, we will have more detail table types) partitioned along the same range, be it id or timestamp. I guess not doing so beats the whole purpose of partitioning as we would not be able to remove data related to the master partitions.

I welcome any pointers or ideas on how to do it properly. In the end we will decide for ourselves, but there is not much material to help with the decision right now. Thanks.

Laurenz Albe · Accepted Answer · 2021-08-21 09:20:48Z

3

Your strategy is good. Partition related tables by the common timestamp and make sure that the partition boundaries are the same.

You probably didn't get the efficient partitionwise join because you didn't set enable_partitionwise_join to on. That parameter is turned off by default because it can consume substantial query planning time that you don't want to expend unless you know you can benefit.

answered Aug 21, 2021 at 9:20

Laurenz Albe

257k22 gold badges312 silver badges388 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

virgo47 Over a year ago

Thank you, that's it. Of course I've seen that settings mentioned, but somehow it didn't ring the right bell at the right time. I added SET + the last query with ON using both ID and timestamp - it the only query affected: db-fiddle.com/f/2mRXT4wGjM2ZSftjgKyZce/47 Rough times with those 10k rows went from planning/execution=0.16/4.1ms to 0.23/2.5, I'll recheck with more partitions too, but it seems to do what we need.

Collectives™ on Stack Overflow

Partitioning of related tables in PostgreSQL

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related