0

I'm trying to delete duplicate values (which will all have the same nid) based on the hash value.

I'm going to leave the initial (oldest) nid row with the same hash.

For some reason, I get the error, "You can't specify target table 'node_revision' for update in FROM clause

I'm trying to alias my tables, but that doesn't seem to work - what am I doing wrong?

delete from node_revision
WHERE nid NOT IN(SELECT MIN(nid) FROM node_revision GROUP BY hash)

(timestamp is just for illustration, don't actually want this used in any queries)

|  nid  |  hash   |  timestamp  |
|   2   | 123456  |  123364600  |
|   2   | 123456  |  123364601  |
|   2   | 1234567 |  123364602  |

Rows 1, and 3 would survive in this case.

4
  • Why don't you want the timestamp used in the queries? Commented Feb 25, 2014 at 21:39
  • Because it was just to illustrate that row 1 is older than row 2. In the actual database, row 1 and row 2 will typically have the exact same timestamp. Commented Feb 25, 2014 at 21:43
  • Is it really that difficult to tell MySQL, "Hey, if something has the same nid value and the same hash value, delete one of them!" ? Commented Feb 25, 2014 at 21:44
  • . . Yes, it is that difficult. In general, databases work by looking at the data in a row, and there is no way in standard SQL to distinguish between two rows that have exactly the same values. Commented Feb 25, 2014 at 22:35

1 Answer 1

1

You can phrase this as a left join:

delete nr from node_revision nr left join
               (SELECT MIN(nid) as minnid
                FROM node_revision
                GROUP BY hash
               ) nrkeep
               on nr.nid = nrkeep.minnid
    where nrkeep.minnid is null;

You can also "trick" MySQL into using the subquery:

DELETE FROM node_revision
    WHERE nid NOT IN (SELECT minnid
                      FROM (SELECT MIN(nid) as minnid FROM node_revision GROUP BY hash
                           ) t
                     );

MySQL has a well-documented limitation on using the modified table in update and delete statements. This query gets around the limitation by actually materializing the list of minnids by using a subquery.

EDIT:

Based on the example now in the question, you should use timestamp as follows:

delete nr from node_revision nr left join
               (SELECT hash, nid, min(timestamp) as mintimestamp
                FROM node_revision
                GROUP BY hash
               ) nrkeep
               on nr.hash = nrkeep.hash and
                  nr.nid = nrkeep.nid and
                  nr.timestamp = nrkeep.mintimestamp
    where nrkeep.minnid is null;
Sign up to request clarification or add additional context in comments.

4 Comments

No, that isn't doing what I want it to, that is leaving multiple instances of nids with the same hash value, which I do not want.
For example, say there are three rows. If there is an nid of 2 on all three rows, and say there was a hash value of 123456 on two of the rows, and 1234567 on the third row, only one of the 123456 rows would survive (the first in the database), and the 1234567 row.
@AndrewAlexander . . . The two versions I left in the answer are equivalent to the query in your question (except they actually should work). Neither does what you say in the comment. You really need some column to distinguish between the rows.
Isn't there some way to do this based on the fact that the nid and the hash values are the same?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.