8

Possible Duplicate:
How can I remove duplicate rows?
Remove duplicates using only a MySQL query?

I have a large table with ~14M entries. The table type is MyISAM ans not InnoDB.

Unfortunately, I have some duplicate entries in this table that I found with the following request :

SELECT device_serial, temp, tstamp, COUNT(*) c FROM up_logs GROUP BY device_serial, temp, tstamp HAVING c > 1

To avoid these duplicates in the future, I want to convert my current index to a unique constraint using SQL request :

ALTER TABLE  up_logs DROP INDEX UK_UP_LOGS_TSTAMP_DEVICE_SERIAL,
ALTER TABLE up_logs ADD INDEX UK_UP_LOGS_TSTAMP_DEVICE_SERIAL (  `tstamp` ,  `device_serial` )

But before that, I need to clean up my duplicates!

My question is : How can I keep only one entry of my duplicated entries? Keep in mind that my table contain 14M entries, so I would like avoid loops if it is possible.

Any comments are welcome!

2
  • possibly accepted answer on this question will be helpful for you too: stackoverflow.com/questions/18932/… Commented Oct 4, 2012 at 13:01
  • do you have any ID or something unique? can you show structure of the table? Commented Oct 4, 2012 at 13:21

3 Answers 3

4

Creating a new unique key on the over columns you need to have as uniques will automatically clean the table of any duplicates.

ALTER IGNORE TABLE `table_name`
    ADD UNIQUE KEY `key_name`(`column_1`,`column_2`);

The IGNORE part does not allow the script to terminate after the first error occurs. And the default behavior is to delete the duplicates.

Sign up to request clarification or add additional context in comments.

2 Comments

Thank you, your solution is perfect and very efficient
As of MySQL 5.7.4, the IGNORE clause for ALTER TABLE is removed and its use produces an error.
4

Since MySQL allows Subqueries in update/delete statements, but not if they refer to the table you want to update, I´d create a copy of the original table first. Then:

DELETE FROM original_table 
WHERE id NOT IN( 
    SELECT id FROM copy_table 
    GROUP BY column1, column2, ...
);

But I could imagine that copying a table with 14M entries takes some time... selecting the items to keep when copying might make it faster:

INSERT INTO copy_table 
    SELECT * FROM original_table 
    GROUP BY column1, column2, ...;

and then

DELETE FROM original_table 
WHERE id IN(
    SELECT id FROM copy_table
);

It was some time since I used MySQL and SQL in general last time, so I´m quite sure that there is something with better performance - but this should work ;)

Comments

1

This is how you can delete duplicate rows... I'll write you my example and you'll need to apply to your code. I have Actors table with ID and I want to delete the rows with repeated first_name

mysql> select actor_id, first_name from actor_2;
+----------+-------------+
| actor_id | first_name  |
+----------+-------------+
|        1 | PENELOPE    |
|        2 | NICK        |
|        3 | ED          |
....
|      199 | JULIA       |
|      200 | THORA       |
+----------+-------------+

200 rows in set (0.00 sec)

-Now I use a Variable called @a to get the ID if the next row have the same first_name(repeated, null if it's not).

mysql> select if(first_name=@a,actor_id,null) as first_names,@a:=first_name from actor_2 order by first_name;
+---------------+----------------+
|  first_names  | @a:=first_name |
+---------------+----------------+
|          NULL | ADAM           |
|            71 | ADAM           |
|          NULL | AL             |
|          NULL | ALAN           |
|          NULL | ALBERT         |
|           125 | ALBERT         |
|          NULL | ALEC           |
|          NULL | ANGELA         |
|           144 | ANGELA         |
...
|          NULL | WILL           |
|          NULL | WILLIAM        |
|          NULL | WOODY          |
|            28 | WOODY          |
|          NULL | ZERO           |
+---------------+----------------+
200 rows in set (0.00 sec)

-Now we can get only duplicates ID:

    mysql> select first_names from (select if(first_name=@a,actor_id,null) as first_names,@a:=first_name from actor_2 order by first_name) as t1;
    +-------------+
    | first_names |
    +-------------+
    |        NULL |
    |          71 |
    |        NULL |
     ...
    |          28 |
    |        NULL |
    +-------------+
    200 rows in set (0.00 sec)

-the Final Step, Lets DELETE!

mysql> delete from actor_2 where actor_id in (select first_names from (select if(first_name=@a,actor_id,null) as first_names,@a:=first_name from actor_2 order by first_name) as t1);
Query OK, 72 rows affected (0.01 sec)

-Now lets check our table:

mysql> select count(*) from actor_2 group by first_name;
+----------+
| count(*) |
+----------+
|        1 |
|        1 |
|        1 |
...
|        1 |
+----------+
128 rows in set (0.00 sec)

it works, if you have any question write me back

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.