SQL - delete duplicate rows

Question

I'm trying to figure out how I can delete duplicate rows from my database but keep one:

|---------------------------|
| id   titleid     version  |
|---------------------------|
| 1   TEST1        1.60     | <--- keep
| 2   TEST1        1.60     | <--- delete
| 3   TEST1        1.60     | <--- delete
| 4   TEST1        1.60     | <--- delete
| 5   TEST55       1.55     | <--- not selected
| 6   TEST88       1.85     | <--- not selected
| 7   TEST56       1.60     | <--- keep
| 8   TEST56       1.60     | <--- delete
|---------------------------|

I've been able to figure out how to select the rows that have duplicate rows:

SELECT a.*
FROM patch a
JOIN (
    SELECT titleid, version, COUNT(*)
    FROM patch
    GROUP BY titleid, version
    HAVING count(*) > 1
) b 
ON a.titleid = b.titleid
AND a.version = b.version 
ORDER BY a.version

How can I modify this query so it deletes the duplicate rows, but keeps one?

I've looked on SO and Google for answers but none seem to work/fit my needs.

Akina · Accepted Answer · 2020-01-04 13:27:24Z

2

Primary key (`id`) added - so

DELETE t1.*
FROM patch t1
JOIN patch t2 USING (title, version)
WHERE t1.id > t2.id

answered Jan 4, 2020 at 13:27

Akina

43.2k6 gold badges17 silver badges30 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Gordon Linoff · Accepted Answer · 2020-01-04 17:50:59Z

2

This answers the original version of the question.

The simplest method in this case is to empty the table and rebuild it:

create table temp_t as
    select distinct title_id, version
    from t;

truncate table t;   -- back it up first!

insert into t (title_id, version)
    select title_id, version
    from temp_t;

An alternative method is to add an auto-incremented primary key column and then use that for deletion:

alter table t add column id int auto_increment primary key;

delete t
from t left join
     (select title, version, min(id) as min_id
      from t
      group by title, version
     ) tt
     on t.id = tt.min_id
where tt.min_id is null;

alter table t drop column id;

Here is a db<>fiddle with this version.

edited Jan 4, 2020 at 17:50

answered Jan 4, 2020 at 12:43

Gordon Linoff

1.3m62 gold badges706 silver badges857 bronze badges

6 Comments

Appel Flap Over a year ago

Thank you for your solutions, Gordon. Is it possible to make a query that doesn't create a different table/rebuilds a table?

Akina Over a year ago

@AppelFlap There is no info in your table which allows to distinguish separate records which are duplicates. Maybe you have simplified the real structure, and some primary index exists?

Appel Flap Over a year ago

@Akina Do you mean if there's a autoincrement id? If yes, I've modified my question to include it.

Akina Over a year ago

@AppelFlap Any primary or unique index - either synthetic or natural.

Gordon Linoff Over a year ago

@AppelFlap . . .(1) I consider it rude to change a question so that existing answers are invalidated. (2) The second solution proposed is essentially the same as the other answer, except it explicitly adds the id in.

|

forpas · Accepted Answer · 2020-01-04 13:27:11Z

1

You must delete all the rows with ids different than than the minimum ids for each combination of titleid and version:

delete from patch
where id not in (
  select t.id from (
    select min(id) id
    from patch
    group by titleid, version
  ) t  
);

See the demo.
Results:

| id  | titleid | version |
| --- | ------- | ------- |
| 1   | TEST1   | 1.6     |
| 5   | TEST55  | 1.55    |
| 6   | TEST88  | 1.85    |
| 7   | TEST56  | 1.6     |

edited Jan 4, 2020 at 13:27

answered Jan 4, 2020 at 13:21

forpas

165k10 gold badges51 silver badges85 bronze badges

Collectives™ on Stack Overflow

SQL - delete duplicate rows

3 Answers 3

Comments

6 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

6 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related