MySQL Duplicate rows - specify columns

Question

How can I run a query that finds duplicates between rows? It needs to not match one field but multiple.

Here is the EXPLAIN of the table.

+-------------+--------------+------+-----+-------------------+----------------+
| Field       | Type         | Null | Key | Default           | Extra          |
+-------------+--------------+------+-----+-------------------+----------------+
| id          | int(11)      | NO   | PRI | NULL              | auto_increment | 
| token       | varchar(64)  | NO   | MUL | NULL              |                | 
| maxvar      | float        | NO   |     | NULL              |                | 
| maxvbr      | float        | NO   |     | NULL              |                | 
| minvcr      | float        | NO   |     | NULL              |                | 
| minvdr      | float        | NO   |     | NULL              |                | 
| atype       | int(11)      | NO   |     | NULL              |                | 
| avalue      | varchar(255) | NO   |     | NULL              |                | 
| createddate | timestamp    | NO   |     | CURRENT_TIMESTAMP |                | 
| timesrun    | int(11)      | NO   |     | NULL              |                | 
+-------------+--------------+------+-----+-------------------+----------------+

I need to match all rows that match: token,maxvar,maxvbr,minvcr,minvdr,type and avalue. If all of those fields match those in another row then treat it as a "duplicate".

Ultimately I want to run this as a delete command but I can easily alter the select.

UPDATE Still looking for solution that deletes with single query in MySQL

Dave · Accepted Answer · 2011-09-29 09:53:01Z

2

Just join the table to itself and compare the rows. You can make sure you keep the duplicate with the lowest ID by requiring the id to be deleted to be greater than the id of a duplicate:

DELETE FROM my_table WHERE id IN (
    SELECT DISTINCT t1.id 
    FROM my_table t1
        JOIN my_table t2 
    WHERE t1.id > t2.id
        AND t1.token = t2.token AND t1.maxvar = t2.maxvar
        AND t1.maxvbr = t2.maxvbr AND t1.minvcr = t2.minvcr
        AND t1.minvdr = t2.minvdr AND t1.type = t2.type)

edited Sep 29, 2011 at 9:53

answered Sep 29, 2011 at 9:47

Dave

11.9k5 gold badges38 silver badges46 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

Lee Armstrong Over a year ago

Some of them have 8 using the examples above. Will this only leave 1?

Dave Over a year ago

Yeah, because I am teling it to delete anything with an id bigger than a duplicate (the dupe with the lowest id will then stay). Although as always, back up your database first!

Lee Armstrong Over a year ago

No, this won't let me update the same table. #1093 - You can't specify target table 'my_table' for update in FROM clause.....and yes using correct table name :-)

Dave Over a year ago

@LeeA: Hmm, you might have to duplicate the table (this totally works in Postgres though). Refer to one table in the select, the other in the delete/update.

Lee Armstrong Over a year ago

Hmmm ok thanks, I wonder if there is any way that I can batch this up.

|

Devart · Accepted Answer · 2011-10-03 07:14:41Z

This query will find all duplicate records which should be deleted -

SELECT t1.id FROM table_duplicates t1
  INNER JOIN (
    SELECT MIN(id) id, token, maxvar, maxvbr, minvcr, minvdr, atype, avalue FROM table_duplicates
    GROUP BY token, maxvar, maxvbr, minvcr, minvdr, atype, avalue
    HAVING COUNT(*) > 1
  ) t2
  ON t1.id <> t2.id AND t1.token = t2.token AND t1.maxvar=t2.maxvar AND t1.maxvbr = t2.maxvbr AND t1.minvcr = t2.minvcr AND t1.minvdr = t2.minvdr AND t1.atype = t2.atype AND t1.avalue = t2.avalue;

This query will remove all duplicates -

DELETE t1 FROM table_duplicates t1
  INNER JOIN (
    SELECT MIN(id) id, token, maxvar, maxvbr, minvcr, minvdr, atype, avalue FROM table_duplicates
    GROUP BY token, maxvar, maxvbr, minvcr, minvdr, atype, avalue
    HAVING COUNT(*) > 1
  ) t2
  ON t1.id <> t2.id AND t1.token = t2.token AND t1.maxvar=t2.maxvar AND t1.maxvbr = t2.maxvbr AND t1.minvcr = t2.minvcr AND t1.minvdr = t2.minvdr AND t1.atype = t2.atype AND t1.avalue = t2.avalue;

Maximilian Mayerl · Accepted Answer · 2011-09-29 09:42:54Z

1

SELECT      token,maxvar,maxvbr,minvcr,minvdr,type, avalue,
            Count(*)
FROM        yourtable
GROUP BY    token,maxvar,maxvbr,minvcr,minvdr,type, avalue
HAVING      Count(*) > 1

This query returns all the rows that are in the table two times or more often (and how often they are).

answered Sep 29, 2011 at 9:42

Maximilian Mayerl

11.4k3 gold badges37 silver badges42 bronze badges

1 Comment

Lee Armstrong Over a year ago

Perfect, is there a way to delete all of them but only leave 1 intact?

Marco · Accepted Answer · 2011-09-29 10:06:22Z

1

Try:

SELECT token,maxvar,maxvbr,minvcr,minvdr,type,avalue, COUNT(*)
FROM table
GROUP BY token,maxvar,maxvbr,minvcr,minvdr,type,avalue
HAVING COUNT(*)>1

edited Sep 29, 2011 at 10:06

answered Sep 29, 2011 at 9:43

Marco

57.7k15 gold badges135 silver badges159 bronze badges

Collectives™ on Stack Overflow

MySQL Duplicate rows - specify columns

4 Answers 4

6 Comments

Comments

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

6 Comments

Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related