1

I've checked documentation and saw some presentations, read blogs, but can't find examples of partitioning of more than a single table in PostgreSQL - and that's what we need. Our tables are insert only audit trail with master-detail structure and we aim to solve our problem with slow data removal problem, currently done using delete.

The simplified structure and some queries are shown in the following fiddle: https://www.db-fiddle.com/f/2mRXT4wGjM2ZSftjgKyZce/46

The issue I'm investigating right now is how to effectively query the detail table, be it in JOIN or directly. Because the timestamp field is part of the partition key I understand that using it in query is essential. I don't understand why JOIN is not able to figure this out when timestamp equality is used in ON clause (couple of explain examples are in the fiddle).

Then there are broader questions:

  1. What is general recommended strategy for similar case? We expect that timestamp is essential for our query, so it feels natural to use it as partitioning key.

  2. I've made a short experiment (so no real experiences from it yet) and based the partitioning solely on id range. This seems to have one advantage - predictable partition table sizes (more or less, depending on the size of variable columns, of course). It is possible to add check timestamp ... conditions on any full partition (and open interval check on active one too!) which helps with partition pruning. This has nice benefit that detail table needs single column FK referencing only master.id (perhaps even pruning better during JOINs). Any ideas or experiences with something similar?

We would rather have time-based partitioning, seems more natural, but it's not a hard condition. The need of dragging timestamp to another table and to its FK, etc. makes it less compelling.

Obviously, we want both tables (all, to be precise, we will have more detail table types) partitioned along the same range, be it id or timestamp. I guess not doing so beats the whole purpose of partitioning as we would not be able to remove data related to the master partitions.

I welcome any pointers or ideas on how to do it properly. In the end we will decide for ourselves, but there is not much material to help with the decision right now. Thanks.

1 Answer 1

3

Your strategy is good. Partition related tables by the common timestamp and make sure that the partition boundaries are the same.

You probably didn't get the efficient partitionwise join because you didn't set enable_partitionwise_join to on. That parameter is turned off by default because it can consume substantial query planning time that you don't want to expend unless you know you can benefit.

Sign up to request clarification or add additional context in comments.

1 Comment

Thank you, that's it. Of course I've seen that settings mentioned, but somehow it didn't ring the right bell at the right time. I added SET + the last query with ON using both ID and timestamp - it the only query affected: db-fiddle.com/f/2mRXT4wGjM2ZSftjgKyZce/47 Rough times with those 10k rows went from planning/execution=0.16/4.1ms to 0.23/2.5, I'll recheck with more partitions too, but it seems to do what we need.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.