0

I'm trying to remove duplicates from a MySQL table where two column values will be the same.

In this case, I want to, say, have an id column (called nid), and a hash column with the same values:

| nid |    hash    |
|  2  |   932298   |
|  2  |   932298   |

I'd like only one of them to survive, preferably the first one inserted in the database.

I'm looking at this post but my use case is slightly different:

MySQL remove duplicates from big database quick

I'm also open to other options

4
  • Why wouldn't you just insert it again after ? Commented Feb 26, 2014 at 14:15
  • 1
    Do you have a primary key column? Commented Feb 26, 2014 at 14:19
  • 1
    How is this question different from stackoverflow.com/questions/22026383/…? Commented Feb 26, 2014 at 14:19
  • I felt I didn't explain it well in the other question. Commented Feb 26, 2014 at 14:34

2 Answers 2

1
ALTER IGNORE TABLE `table_name` ADD UNIQUE (`hash`)
Sign up to request clarification or add additional context in comments.

13 Comments

Wouldn't hash be the primary key? There can be multiple nids, but I only want one hash value per nid (and theoretically, it should be a unique hash id /period/)
You have to explain me this problem a little bit, do you need unique pairs or unique hashes?
Ok, so there will be multiple nids with the exact same value - there could be 30 different rows with an nid of 2, for example. The hash is calculated based on the properties of an object related to the database table, and should theoretically be unique - if there are two hashes that are exactly the same, that means that the two objects/rows are the same, and I only want one of them (preferably the first one inserted into the database) to remain. Is that clearer?
can you please confirm this? You can have pairs (nid, hash) = (1,1), (1,2), (2,1), (2,2)? is that true?
You could theoretically have (1, 1), (1, 2), (2, 1), (2, 2), but the way the hash is calculated, it would be almost impossible to have the same hash for rows with a different nid, so it is best to assume that the only time hash will be the same is when it has the same nid. And all rows where there are duplicate hashes can be discarded.
|
0

simplest way of achieving it I believe:

1) create table `table_copy`(id int primary key,hash varchar(255), unique(`hash`)) select distinct id,hash from `table_name`;
2) drop table `table_name`;
3) rename table `table_copy` to `table_name`;

2 Comments

You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ''table_copy'(nid,hash) as select distinct nid,hash from node_revision' at line 1
@AndrewAlexander, edits being made try now..syntax were not exact but anyway you get the idea yeah?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.