0

My table structure is as described in this post:

 name | version | processed | processing | updated  | ref_time 
------+---------+-----------+------------+----------+----------
 abc  |       1 | t         | f          | 27794395 | 27794160
 def  |       1 | t         | f          | 27794395 | 27793440
 ghi  |       1 | t         | f          | 27794395 | 27793440
 jkl  |       1 | f         | f          | 27794395 | 27794160
 mno  |       1 | t         | f          | 27794395 | 27793440
 pqr  |       1 | f         | t          | 27794395 | 27794160

Based on this answer, I am deriving a list of ref_time values which I want to use as a basis for deleting 'old' entries from status_table.

This is the query to generate the list of relevant ref_time values:

WITH main AS
(
    SELECT ref_time,
        ROUND(AVG(processed::int) * 100, 1) percent
    FROM status_table
    GROUP BY ref_time ORDER BY ref_time DESC, percent DESC
)
SELECT ref_time FROM main WHERE percent=100 OFFSET 2;

For example this might return:

 ref_time 
----------
 27794880
 27794160

I can then use this to DELETE all relevant entries in the status_table:

DELETE FROM status_table
WHERE ref_time IN 
(
    WITH main AS
    (
        SELECT ref_time,
            ROUND(AVG(processed::int) * 100, 1) percent
        FROM status_table
        GROUP BY ref_time ORDER BY ref_time DESC, percent DESC
    )
    SELECT ref_time FROM main WHERE percent=100 OFFSET 2
);

But I have another table named data_table, which also has a ref_time column, and I want to DELETE entries from that table on the same basis, i.e. any rows having ref_time in the above list.

How do I achieve this without duplicating the query used to generate the ref_time list?

1
  • You may store ref_time values to be deleted into a temporary table and then use it more than once. Commented Nov 6, 2022 at 20:22

1 Answer 1

2

You can use common table expressions:

with 
    ref as (
        select ref_time 
        from status_table 
        group by ref_time 
        having bool_and(processed)
        order by ref_time desc limit 2
    ),
    del_ref as (
        delete from status_table s
        using ref r
        where s.ref_time = r.ref_time
    )
delete from data_table d
using ref r
where d.ref_time = r.ref_time

The first CTE,ref, returns the list of timestamps that you want to delete from the two other tables. I attempted to simplify the logic: you seem to want the top 2 timestamps that are fully processed (note that offset skips that many rows from the resultset, which is different than limit).

The second CTE deletes from status_table, and the last part of the query addresses data_table.

Sign up to request clarification or add additional context in comments.

9 Comments

This is really useful, thanks. Regarding limit vs offset I think it is offset that I need, because I want to delete anything other than the most recent two fully processed batches... hence offset to skip over the top two (the ones I want to keep) when forming the list for DELETE... but no matter... it's the overall structure that matters, and it's really useful to see how it can be done. Out of interest, you seem to prefer putting everything in lower-case, even internal functions... is that personal preference or a recent trend? Thanks!
And it seems that del_ref is defined but not used?
@drmrbrewer: ok, so you want offset indeed. As for the lower case, that's just personal taste, and there is no "standard" here. del_ref is a CTE, so it needs to be named - but indeed it is not used afterwards.
@drmrbrewer: yes, exactly.
@drmrbrewer: yes, r is a dataset, not a single record; the delete ... using ... syntax is really like a join, in essence.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.