1

I'm trying to figure out how I can delete duplicate rows from my database but keep one:

|---------------------------|
| id   titleid     version  |
|---------------------------|
| 1   TEST1        1.60     | <--- keep
| 2   TEST1        1.60     | <--- delete
| 3   TEST1        1.60     | <--- delete
| 4   TEST1        1.60     | <--- delete
| 5   TEST55       1.55     | <--- not selected
| 6   TEST88       1.85     | <--- not selected
| 7   TEST56       1.60     | <--- keep
| 8   TEST56       1.60     | <--- delete
|---------------------------|

I've been able to figure out how to select the rows that have duplicate rows:

SELECT a.*
FROM patch a
JOIN (
    SELECT titleid, version, COUNT(*)
    FROM patch
    GROUP BY titleid, version
    HAVING count(*) > 1
) b 
ON a.titleid = b.titleid
AND a.version = b.version 
ORDER BY a.version

How can I modify this query so it deletes the duplicate rows, but keeps one?

I've looked on SO and Google for answers but none seem to work/fit my needs.

3 Answers 3

2

Primary key (`id`) added - so

DELETE t1.*
FROM patch t1
JOIN patch t2 USING (title, version)
WHERE t1.id > t2.id 
Sign up to request clarification or add additional context in comments.

Comments

2

This answers the original version of the question.

The simplest method in this case is to empty the table and rebuild it:

create table temp_t as
    select distinct title_id, version
    from t;

truncate table t;   -- back it up first!

insert into t (title_id, version)
    select title_id, version
    from temp_t;

An alternative method is to add an auto-incremented primary key column and then use that for deletion:

alter table t add column id int auto_increment primary key;

delete t
from t left join
     (select title, version, min(id) as min_id
      from t
      group by title, version
     ) tt
     on t.id = tt.min_id
where tt.min_id is null;

alter table t drop column id;

Here is a db<>fiddle with this version.

6 Comments

Thank you for your solutions, Gordon. Is it possible to make a query that doesn't create a different table/rebuilds a table?
@AppelFlap There is no info in your table which allows to distinguish separate records which are duplicates. Maybe you have simplified the real structure, and some primary index exists?
@Akina Do you mean if there's a autoincrement id? If yes, I've modified my question to include it.
@AppelFlap Any primary or unique index - either synthetic or natural.
@AppelFlap . . .(1) I consider it rude to change a question so that existing answers are invalidated. (2) The second solution proposed is essentially the same as the other answer, except it explicitly adds the id in.
|
1

You must delete all the rows with ids different than than the minimum ids for each combination of titleid and version:

delete from patch
where id not in (
  select t.id from (
    select min(id) id
    from patch
    group by titleid, version
  ) t  
);

See the demo.
Results:

| id  | titleid | version |
| --- | ------- | ------- |
| 1   | TEST1   | 1.6     |
| 5   | TEST55  | 1.55    |
| 6   | TEST88  | 1.85    |
| 7   | TEST56  | 1.6     |

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.