Mysql Duplicate Rows ( Duplicate detected using 2 columns )

Question

How to remove duplicated in this setup?

id    A       B 
----------------
1     apple   2  
2     orange  1       
3     apple   2   
4     apple   1

In here I want to remove (apple,2) which occurs twice. The id numbers are unique. I would use DISTINCT keyword if it were not. Can I some how make a key out of columns A and B and then use the DISTINCT keyword on that to get what I need ? Many thanks for your replies.

Thank you all for the replies again. I think I have a good idea now how to proceed. — jason
– jason, Commented Nov 25, 2009 at 20:01

davek · Accepted Answer · 2009-11-25 20:54:05Z

22

delete from myTable 
where id not in
(select min(id)
from myTable
group by A, B)

i.e. the select in brackets returns the first id for each grouping of A and B; deleting all ids that are not in this set will remove all occurences of an A-plus-B combination that are "subsequent" to its first occurrence.

EDIT: this syntax seems to be problematic: see bug report:

http://bugs.mysql.com/bug.php?id=5037

A possible workaround is to do this:

delete from myTable 
where id not in
(
      select minid from 
      (select min(id) as minid from myTable group by A, B) as newtable
)

edited Nov 25, 2009 at 20:54

answered Nov 25, 2009 at 19:53

davek

23k11 gold badges80 silver badges97 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

Benjamin Cox Over a year ago

How does this perform relative to my answer below? I'm not enough of a DB guru to analyze it...

ps. Over a year ago

Nice.. this will remove row where id=3 and not where id in 1,3

davek Over a year ago

@Benjamin: I'm not sure: my guess is that it will depend on the data distribution. But this version should be portable to other databases and for me - at least! - it's more readable.

Benjamin Cox Over a year ago

Definitely more readable - glad to hear it's more portable as well. I'll be testing this out next week on my own data set. Thanks, Dave!

jason Over a year ago

I get this error when using this construct. I can always use a temp table ofcourse. ERROR 1093 (HY000): You can't specify target table 'myTable' for update in FROM clause.

|

Peter M · Accepted Answer · 2013-01-23 20:14:01Z

6

Yet another (from http://labs.creativecommons.org/2010/01/12/removing-duplicate-rows-in-mysql/). Add a unique index then delete it:

ALTER IGNORE TABLE mytable ADD UNIQUE INDEX tmpindex (A,B);
ALTER TABLE mytable DROP INDEX tmpindex;

The IGNORE keyword is a mysql extension that makes it drop rows that violate the UNIQUE keyword instead of just failing.

answered Jan 23, 2013 at 20:14

Peter M

8445 silver badges15 bronze badges

Comments

Larry Lustig · Accepted Answer · 2009-11-25 19:54:21Z

2

DELETE FROM fruit_table FT1
WHERE EXISTS
(
    SELECT * FROM fruit_table FT2 
    WHERE FT2.fruit_name_column = FT1.fruit_name_column
    AND   FT2.fruit_integer_column = FT1.fruit_integer_column
    AND   FT2.id <> FT1.id
)

This assumes you don't care which of the duplicate records is removed.

answered Nov 25, 2009 at 19:54

Larry Lustig

51.2k16 gold badges119 silver badges173 bronze badges

1 Comment

Mitya Over a year ago

Errors for me in MySQL - "You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'FT1"

Benjamin Cox · Accepted Answer · 2009-11-25 19:57:41Z

1

DELETE
FROM mytable
USING mytable, mytable AS vtable
WHERE vtable.id > mytable.id
AND mytable.A = vtable.A
AND mytable.A = vtable.A

answered Nov 25, 2009 at 19:57

Benjamin Cox

6,12024 silver badges19 bronze badges

1 Comment

Mitya Over a year ago

Errors for me in MySQL - "You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'tbl USING..."

Pablo Santa Cruz · Accepted Answer · 2009-11-25 19:54:29Z

0

You could use a temporary table with the data you want:

insert into temp_table
select min(id), A, B
 group by A, B

answered Nov 25, 2009 at 19:54

Pablo Santa Cruz

182k33 gold badges250 silver badges300 bronze badges

Comments

BryanD · Accepted Answer · 2009-11-25 19:54:48Z

0

I'm not exactly sure what you're asking here. If you don't want duplicates of the A and B columns, then do just what you mentioned SELECT DISTINCT A, B FROM XXX. Maybe you could post an example of the type of result you would like to see.

answered Nov 25, 2009 at 19:54

BryanD

1,92712 silver badges13 bronze badges

1 Comment

jason Over a year ago

I guess "group by" is what I was missing, the other posts have clarified this.

Collectives™ on Stack Overflow

Mysql Duplicate Rows ( Duplicate detected using 2 columns )

6 Answers 6

7 Comments

Comments

1 Comment

1 Comment

Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

6 Answers 6

7 Comments

Comments

1 Comment

1 Comment

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related