2

This code gives me a table of the unique values (without duplicates):

SELECT id, firstname, lastname, startdate, position
FROM  (
   SELECT id, firstname, lastname, startdate, position,
     ROW_NUMBER() OVER (PARTITION BY (firstname, lastname) ORDER BY startdate DESC) rn
   FROM people
   ) tmp
WHERE rn = 1;

What syntax would replace the current table with just the results of this one?

Alternatively, I could use WHERE rn <> 1 to get all the data I want to delete, but again, I am struggling to get the syntax of the DELETE right using this method.

2
  • I'm thinking this question needs some more thought and information. I'm not seeing DELETE in the example. Furthermore you seem to be wanting to do an INSERT or UPDATE("What syntax would replace the current table with just the results of this one?") What is the current table? Commented Jul 20, 2020 at 18:12
  • This is a good discussion of several different options in the question and answers if you ignore the "slowness" issue: stackoverflow.com/questions/47402098/… Commented Jul 20, 2020 at 18:19

1 Answer 1

1

Assuming values in firstname, lastname and startdate are never NULL, this simple query with a NOT EXISTS anti-semi-join does the job:

DELETE FROM people AS p
WHERE  EXISTS (
   SELECT FROM people AS p1
   WHERE  p1.firstname = p.firstname
   AND    p1.lastname  = p.lastname
   AND    p1.startdate > p.startdate
   );

It deletes every row where a newer copy exists, effectively keeping the latest row per group of peers. (Of course, (firstname, lastname) is a poor way of establishing identity. There are many distinct people with identical names. The demo may be simplified ...)

Can there be identical values in startdate? Then you need a tiebreaker ...

Typically faster than using a subquery with row_number(). There are a hundred and one ways to make this faster, depending on your precise situation and requirements. See:

If compared columns can be NULL, consider:

There is a whole dedicated tag for . Combine it with to narrow down:

Sign up to request clarification or add additional context in comments.

3 Comments

Your method looks much more simple than what I was trying to do. The example code isn't my exact situation. I have a serial_no for each row that has duplicates. The compared date doesn't have nulls, but there can be ties. Can you elaborate on how I could include a tie breaker?
@tuskity: Please clarify the question accordingly. Tell us what ties can happen and how you want to break ties, then I can tell you how to implement it. You might alternatively start a new question for this with details.
I set up a new question for the topic here: stackoverflow.com/questions/63005307/…

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.