Can I delete rows/replace table based on temp table

Question

This code gives me a table of the unique values (without duplicates):

SELECT id, firstname, lastname, startdate, position
FROM  (
   SELECT id, firstname, lastname, startdate, position,
     ROW_NUMBER() OVER (PARTITION BY (firstname, lastname) ORDER BY startdate DESC) rn
   FROM people
   ) tmp
WHERE rn = 1;

What syntax would replace the current table with just the results of this one?

Alternatively, I could use WHERE rn <> 1 to get all the data I want to delete, but again, I am struggling to get the syntax of the DELETE right using this method.

I'm thinking this question needs some more thought and information. I'm not seeing DELETE in the example. Furthermore you seem to be wanting to do an INSERT or UPDATE("What syntax would replace the current table with just the results of this one?") What is the current table? — Adrian Klaver
– Adrian Klaver, Commented Jul 20, 2020 at 18:12
This is a good discussion of several different options in the question and answers if you ignore the "slowness" issue: stackoverflow.com/questions/47402098/… — Mike Organek
– Mike Organek, Commented Jul 20, 2020 at 18:19

Erwin Brandstetter · Accepted Answer · 2020-07-20 22:10:25Z

1

Assuming values in firstname, lastname and startdate are never NULL, this simple query with a NOT EXISTS anti-semi-join does the job:

DELETE FROM people AS p
WHERE  EXISTS (
   SELECT FROM people AS p1
   WHERE  p1.firstname = p.firstname
   AND    p1.lastname  = p.lastname
   AND    p1.startdate > p.startdate
   );

It deletes every row where a newer copy exists, effectively keeping the latest row per group of peers. (Of course, (firstname, lastname) is a poor way of establishing identity. There are many distinct people with identical names. The demo may be simplified ...)

Can there be identical values in startdate? Then you need a tiebreaker ...

Typically faster than using a subquery with row_number(). There are a hundred and one ways to make this faster, depending on your precise situation and requirements. See:

How do I (or can I) SELECT DISTINCT on multiple columns?

If compared columns can be NULL, consider:

How to delete duplicate rows without unique identifier

There is a whole dedicated tag for duplicate-removal. Combine it with postgres to narrow down:

https://stackoverflow.com/questions/tagged/duplicates+postgresql

edited Jul 20, 2020 at 22:10

answered Jul 20, 2020 at 22:03

Erwin Brandstetter

669k160 gold badges1.2k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

tuskity Over a year ago

Your method looks much more simple than what I was trying to do. The example code isn't my exact situation. I have a serial_no for each row that has duplicates. The compared date doesn't have nulls, but there can be ties. Can you elaborate on how I could include a tie breaker?

Erwin Brandstetter Over a year ago

@tuskity: Please clarify the question accordingly. Tell us what ties can happen and how you want to break ties, then I can tell you how to implement it. You might alternatively start a new question for this with details.

tuskity Over a year ago

I set up a new question for the topic here: stackoverflow.com/questions/63005307/…

Collectives™ on Stack Overflow

Can I delete rows/replace table based on temp table

1 Answer 1

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related