Trying to delete duplicate rows based on a hash in MySQL

Question

I'm trying to delete duplicate values (which will all have the same nid) based on the hash value.

I'm going to leave the initial (oldest) nid row with the same hash.

For some reason, I get the error, "You can't specify target table 'node_revision' for update in FROM clause

I'm trying to alias my tables, but that doesn't seem to work - what am I doing wrong?

delete from node_revision
WHERE nid NOT IN(SELECT MIN(nid) FROM node_revision GROUP BY hash)

(timestamp is just for illustration, don't actually want this used in any queries)

|  nid  |  hash   |  timestamp  |
|   2   | 123456  |  123364600  |
|   2   | 123456  |  123364601  |
|   2   | 1234567 |  123364602  |

Rows 1, and 3 would survive in this case.

Because it was just to illustrate that row 1 is older than row 2. In the actual database, row 1 and row 2 will typically have the exact same timestamp. — Steven Matthews
– Steven Matthews, Commented Feb 25, 2014 at 21:43
Is it really that difficult to tell MySQL, "Hey, if something has the same nid value and the same hash value, delete one of them!" ? — Steven Matthews
– Steven Matthews, Commented Feb 25, 2014 at 21:44
. . Yes, it is that difficult. In general, databases work by looking at the data in a row, and there is no way in standard SQL to distinguish between two rows that have exactly the same values. — Gordon Linoff
– Gordon Linoff, Commented Feb 25, 2014 at 22:35

Gordon Linoff · Accepted Answer · 2014-02-25 21:42:04Z

1

You can phrase this as a left join:

delete nr from node_revision nr left join
               (SELECT MIN(nid) as minnid
                FROM node_revision
                GROUP BY hash
               ) nrkeep
               on nr.nid = nrkeep.minnid
    where nrkeep.minnid is null;

You can also "trick" MySQL into using the subquery:

DELETE FROM node_revision
    WHERE nid NOT IN (SELECT minnid
                      FROM (SELECT MIN(nid) as minnid FROM node_revision GROUP BY hash
                           ) t
                     );

MySQL has a well-documented limitation on using the modified table in update and delete statements. This query gets around the limitation by actually materializing the list of minnids by using a subquery.

EDIT:

Based on the example now in the question, you should use timestamp as follows:

delete nr from node_revision nr left join
               (SELECT hash, nid, min(timestamp) as mintimestamp
                FROM node_revision
                GROUP BY hash
               ) nrkeep
               on nr.hash = nrkeep.hash and
                  nr.nid = nrkeep.nid and
                  nr.timestamp = nrkeep.mintimestamp
    where nrkeep.minnid is null;

edited Feb 25, 2014 at 21:42

answered Feb 25, 2014 at 21:19

Gordon Linoff

1.3m62 gold badges706 silver badges857 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Steven Matthews Over a year ago

No, that isn't doing what I want it to, that is leaving multiple instances of nids with the same hash value, which I do not want.

Steven Matthews Over a year ago

For example, say there are three rows. If there is an nid of 2 on all three rows, and say there was a hash value of 123456 on two of the rows, and 1234567 on the third row, only one of the 123456 rows would survive (the first in the database), and the 1234567 row.

Gordon Linoff Over a year ago

@AndrewAlexander . . . The two versions I left in the answer are equivalent to the query in your question (except they actually should work). Neither does what you say in the comment. You really need some column to distinguish between the rows.

Steven Matthews Over a year ago

Isn't there some way to do this based on the fact that the nid and the hash values are the same?

Collectives™ on Stack Overflow

Trying to delete duplicate rows based on a hash in MySQL

1 Answer 1

4 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related