25

I have a table of games, which is described as follows:

+---------------+-------------+------+-----+---------+----------------+
| Field         | Type        | Null | Key | Default | Extra          |
+---------------+-------------+------+-----+---------+----------------+
| id            | int(11)     | NO   | PRI | NULL    | auto_increment |
| date          | date        | NO   |     | NULL    |                |
| time          | time        | NO   |     | NULL    |                |
| hometeam_id   | int(11)     | NO   | MUL | NULL    |                |
| awayteam_id   | int(11)     | NO   | MUL | NULL    |                |
| locationcity  | varchar(30) | NO   |     | NULL    |                |
| locationstate | varchar(20) | NO   |     | NULL    |                |
+---------------+-------------+------+-----+---------+----------------+

But each game has a duplicate entry in the table somewhere, because each game was in the schedules for two teams. Is there a sql statement I can use to look through and delete all the duplicates based on identical date, time, hometeam_id, awayteam_id, locationcity, and locationstate fields?

9 Answers 9

45

You should be able to do a correlated subquery to delete the data. Find all rows that are duplicates and delete all but the one with the smallest id. For MYSQL, an inner join (functional equivalent of EXISTS) needs to be used, like so:

delete games from games inner join 
    (select  min(id) minid, date, time,
             hometeam_id, awayteam_id, locationcity, locationstate
     from games 
     group by date, time, hometeam_id, 
              awayteam_id, locationcity, locationstate
     having count(1) > 1) as duplicates
   on (duplicates.date = games.date
   and duplicates.time = games.time
   and duplicates.hometeam_id = games.hometeam_id
   and duplicates.awayteam_id = games.awayteam_id
   and duplicates.locationcity = games.locationcity
   and duplicates.locationstate = games.locationstate
   and duplicates.minid <> games.id)

To test, replace delete games from games with select * from games. Don't just run a delete on your DB :-)

Sign up to request clarification or add additional context in comments.

2 Comments

I tried the select version of this, and it looked like exactly what I wanted to get rid of, but when I actually ran it with "delete from," it threw an error and told me "Error Code: 1093. You can't specify target table 'game' for update in FROM clause" Any ideas?
Try the updated answer, I replaced EXISTS with an INNER JOIN delete. I think mysql might be having trouble with a delete and EXISTS clause.
13

You can try such query:

DELETE FROM table_name AS t1
WHERE EXISTS (
 SELECT 1 FROM table_name AS t2 
 WHERE t2.date = t1.date 
 AND t2.time = t1.time 
 AND t2.hometeam_id = t1.hometeam_id 
 AND t2.awayteam_id = t1.awayteam_id 
 AND t2.locationcity = t1.locationcity 
 AND t2.id > t1.id )

This will leave in database only one example of each game instance which has the smallest id.

1 Comment

Great solution! However, in the last line it should be < in order to delete the smallest ID.
7

The best thing that worked for me was to recreate the table.

CREATE TABLE newtable SELECT * FROM oldtable GROUP BY field1,field2;

You can then rename.

3 Comments

This is by far the best and more straight forward solution. You can't go wrong using this.
The one downside to this is that you lose the constraints, but you can just TRUNCATE oldtable and copy everything back into it from newtable, so it works like a charm
Most secure solution, much better than a DELETE statement, IMO.
5

To get list of duplicate entried matching two fields

select t.ID, t.field1, t.field2
from (
  select field1, field2
  from table_name
  group by field1, field2
  having count(*) > 1) x, table_name t
where x.field1 = t.field1 and x.field2 = t.field2
order by t.field1, t.field2

And to delete all the duplicate only

DELETE x 
FROM table_name x
JOIN table_name y
ON y.field1= x.field1
AND y.field2 = x.field2
AND y.id < x.id;

1 Comment

The above query does expected but it removes the last row from the resultset. So I made a correction in the query as follows: DELETE x FROM table_name x JOIN table_name y ON y.field1= x.field1 AND y.field2 = x.field2 AND y.id > x.id;
4
select orig.id,
       dupl.id
from   games   orig, 
       games   dupl
where  orig.date   =    dupl.date
and    orig.time   =    dupl.time
and    orig.hometeam_id = dupl.hometeam_id
and    orig. awayteam_id = dupl.awayeam_id
and    orig.locationcity = dupl.locationcity
and    orig.locationstate = dupl.locationstate
and    orig.id     <    dupl.id

this should give you the duplicates; you can use it as a subquery to specify IDs to delete.

Comments

3

AS long as you are not getting id (primary key) of the table in your select query and the other data is exact same you can use SELECT DISTINCT to avoid getting duplicate results.

Comments

2
delete from games 
   where id not in 
   (select max(id)  from games 
    group by date, time, hometeam_id, awayteam_id, locationcity, locationstate 
    );

Workaround

select max(id)  id from games 
    group by date, time, hometeam_id, awayteam_id, locationcity, locationstate
into table temp_table;

delete from games where id in (select id from temp);

3 Comments

This approach would only delete one duplicate row per game, regardless of how many duplicate rows existed for the game.
This gives me the same error that I got from @Neville K's post. ERROR 1093 (HY000): You can't specify target table 'game' for update in FROM clause
Sometimes you can not update values if it changes resultset. I've added workaround for that.
1
DELETE FROM table
WHERE id = 
    (SELECT t.id
    FROM table as t
    JOIN (table as tj ON (t.date = tj.data
                          AND t.hometeam_id = tj.hometeam_id
                          AND t.awayteam_id = tj.awayteam_id
                          ...))

2 Comments

This is very complicated version of simple delete from table
oops, missed the t.id <> tj.id in the JOIN.
1
DELETE FROM tbl
 USING tbl, tbl t2
 WHERE tbl.id > t2.id
  AND t2.field = tbl.field;

in your case:

DELETE FROM games
 USING games tbl, games t2
 WHERE tbl.id > t2.id
  AND t2.date = tbl.date
  AND t2.time = tbl.time
  AND t2.hometeam_id = tbl.hometeam_id
  AND t2.awayteam_id = tbl.awayteam_id
  AND t2.locationcity = tbl.locationcity
  AND t2.locationstate = tbl.locationstate;

reference: https://dev.mysql.com/doc/refman/5.7/en/delete.html

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.