2

How can I run a query that finds duplicates between rows? It needs to not match one field but multiple.

Here is the EXPLAIN of the table.

+-------------+--------------+------+-----+-------------------+----------------+
| Field       | Type         | Null | Key | Default           | Extra          |
+-------------+--------------+------+-----+-------------------+----------------+
| id          | int(11)      | NO   | PRI | NULL              | auto_increment | 
| token       | varchar(64)  | NO   | MUL | NULL              |                | 
| maxvar      | float        | NO   |     | NULL              |                | 
| maxvbr      | float        | NO   |     | NULL              |                | 
| minvcr      | float        | NO   |     | NULL              |                | 
| minvdr      | float        | NO   |     | NULL              |                | 
| atype       | int(11)      | NO   |     | NULL              |                | 
| avalue      | varchar(255) | NO   |     | NULL              |                | 
| createddate | timestamp    | NO   |     | CURRENT_TIMESTAMP |                | 
| timesrun    | int(11)      | NO   |     | NULL              |                | 
+-------------+--------------+------+-----+-------------------+----------------+

I need to match all rows that match: token,maxvar,maxvbr,minvcr,minvdr,type and avalue. If all of those fields match those in another row then treat it as a "duplicate".

Ultimately I want to run this as a delete command but I can easily alter the select.

UPDATE Still looking for solution that deletes with single query in MySQL

4 Answers 4

2

Just join the table to itself and compare the rows. You can make sure you keep the duplicate with the lowest ID by requiring the id to be deleted to be greater than the id of a duplicate:

DELETE FROM my_table WHERE id IN (
    SELECT DISTINCT t1.id 
    FROM my_table t1
        JOIN my_table t2 
    WHERE t1.id > t2.id
        AND t1.token = t2.token AND t1.maxvar = t2.maxvar
        AND t1.maxvbr = t2.maxvbr AND t1.minvcr = t2.minvcr
        AND t1.minvdr = t2.minvdr AND t1.type = t2.type)
Sign up to request clarification or add additional context in comments.

6 Comments

Some of them have 8 using the examples above. Will this only leave 1?
Yeah, because I am teling it to delete anything with an id bigger than a duplicate (the dupe with the lowest id will then stay). Although as always, back up your database first!
No, this won't let me update the same table. #1093 - You can't specify target table 'my_table' for update in FROM clause.....and yes using correct table name :-)
@LeeA: Hmm, you might have to duplicate the table (this totally works in Postgres though). Refer to one table in the select, the other in the delete/update.
Hmmm ok thanks, I wonder if there is any way that I can batch this up.
|
1
+50

This query will find all duplicate records which should be deleted -

SELECT t1.id FROM table_duplicates t1
  INNER JOIN (
    SELECT MIN(id) id, token, maxvar, maxvbr, minvcr, minvdr, atype, avalue FROM table_duplicates
    GROUP BY token, maxvar, maxvbr, minvcr, minvdr, atype, avalue
    HAVING COUNT(*) > 1
  ) t2
  ON t1.id <> t2.id AND t1.token = t2.token AND t1.maxvar=t2.maxvar AND t1.maxvbr = t2.maxvbr AND t1.minvcr = t2.minvcr AND t1.minvdr = t2.minvdr AND t1.atype = t2.atype AND t1.avalue = t2.avalue;

This query will remove all duplicates -

DELETE t1 FROM table_duplicates t1
  INNER JOIN (
    SELECT MIN(id) id, token, maxvar, maxvbr, minvcr, minvdr, atype, avalue FROM table_duplicates
    GROUP BY token, maxvar, maxvbr, minvcr, minvdr, atype, avalue
    HAVING COUNT(*) > 1
  ) t2
  ON t1.id <> t2.id AND t1.token = t2.token AND t1.maxvar=t2.maxvar AND t1.maxvbr = t2.maxvbr AND t1.minvcr = t2.minvcr AND t1.minvdr = t2.minvdr AND t1.atype = t2.atype AND t1.avalue = t2.avalue;

Comments

1
SELECT      token,maxvar,maxvbr,minvcr,minvdr,type, avalue,
            Count(*)
FROM        yourtable
GROUP BY    token,maxvar,maxvbr,minvcr,minvdr,type, avalue
HAVING      Count(*) > 1

This query returns all the rows that are in the table two times or more often (and how often they are).

1 Comment

Perfect, is there a way to delete all of them but only leave 1 intact?
1

Try:

SELECT token,maxvar,maxvbr,minvcr,minvdr,type,avalue, COUNT(*)
FROM table
GROUP BY token,maxvar,maxvbr,minvcr,minvdr,type,avalue
HAVING COUNT(*)>1

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.