Postgres delete all duplicate records but one by sorting

Question

I would like to know, in Postgres, how to delete all duplicate records but one by sorting by a column.

Let's say I have the following table foo:

 id                 | name |  region   |          created_at
--------------------+------+-----------+-------------------------------
                  1 | foo  | sydney    | 2018-05-24 15:40:32.593745+10
                  2 | foo  | melbourne | 2018-05-24 17:28:59.452225+10
                  3 | foo  | sydney    | 2018-05-29 22:17:02.927263+10
                  4 | foo  | sydney    | 2018-06-13 16:44:32.703174+10
                  5 | foo  | sydney    | 2018-06-13 16:45:01.324273+10
                  6 | foo  | sydney    | 2018-06-13 17:04:49.487767+10
                  7 | foo  | sydney    | 2018-06-13 17:05:13.592844+10

I would like to delete all duplicates by checking for (name, region) tuple, but keep the one with the greatest created_at column. The result would be:

 id                 | name |  region   |          created_at
--------------------+------+-----------+-------------------------------
                  2 | foo  | melbourne | 2018-05-24 17:28:59.452225+10
                  7 | foo  | sydney    | 2018-06-13 17:05:13.592844+10

But I do not know where to begin. Any ideas?

esqew · Accepted Answer · 2018-10-16 03:37:13Z

1

Use a subquery with ROW_NUMBER and PARTITION BY to filter out rows that have duplicate regions while retaining the most recent in each region. Ensure your subquery uses the AS keyword to prevent Postgre syntax errors:

SELECT * 
FROM foo 
WHERE id IN (
  SELECT a.id 
  FROM (
    SELECT id, ROW_NUMBER() OVER (
        PARTITION BY region 
        ORDER BY created_at DESC
    ) row_no 
    FROM foo
  ) AS a 
  WHERE row_no > 1
);

... returns the rows to be deleted. Replace SELECT * with DELETE when you're satisfied with the result to delete the rows.

SQLFiddle demo

answered Oct 16, 2018 at 3:37

esqew

45k28 gold badges139 silver badges180 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

mc9 Over a year ago

Why are we using PARTITION BY region rather than PARTITION BY region, name? What is the difference?

esqew Over a year ago

Your example indicates your dataset has the same value for name across the board - adding it to the PARTITION BY would be immaterial to the output for the example.

AmeyaN99 · Accepted Answer · 2018-10-16 03:29:23Z

1

DELETE FROM foo
      WHERE id IN
               (SELECT id
                  FROM (SELECT id,
                               ROW_NUMBER ()
                               OVER (PARTITION BY region
                                     ORDER BY created_at DESC)
                                  row_no
                          FROM foo)
                 WHERE row_no > 1)

answered Oct 16, 2018 at 3:29

AmeyaN99

1724 bronze badges

Collectives™ on Stack Overflow

Postgres delete all duplicate records but one by sorting

2 Answers 2

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related