20

How to remove duplicated in this setup?

id    A       B 
----------------
1     apple   2  
2     orange  1       
3     apple   2   
4     apple   1 

In here I want to remove (apple,2) which occurs twice. The id numbers are unique. I would use DISTINCT keyword if it were not. Can I some how make a key out of columns A and B and then use the DISTINCT keyword on that to get what I need ? Many thanks for your replies.

1
  • Thank you all for the replies again. I think I have a good idea now how to proceed. Commented Nov 25, 2009 at 20:01

6 Answers 6

22
delete from myTable 
where id not in
(select min(id)
from myTable
group by A, B)

i.e. the select in brackets returns the first id for each grouping of A and B; deleting all ids that are not in this set will remove all occurences of an A-plus-B combination that are "subsequent" to its first occurrence.

EDIT: this syntax seems to be problematic: see bug report:

http://bugs.mysql.com/bug.php?id=5037

A possible workaround is to do this:

delete from myTable 
where id not in
(
      select minid from 
      (select min(id) as minid from myTable group by A, B) as newtable
) 
Sign up to request clarification or add additional context in comments.

7 Comments

How does this perform relative to my answer below? I'm not enough of a DB guru to analyze it...
Nice.. this will remove row where id=3 and not where id in 1,3
@Benjamin: I'm not sure: my guess is that it will depend on the data distribution. But this version should be portable to other databases and for me - at least! - it's more readable.
Definitely more readable - glad to hear it's more portable as well. I'll be testing this out next week on my own data set. Thanks, Dave!
I get this error when using this construct. I can always use a temp table ofcourse. ERROR 1093 (HY000): You can't specify target table 'myTable' for update in FROM clause.
|
6

Yet another (from http://labs.creativecommons.org/2010/01/12/removing-duplicate-rows-in-mysql/). Add a unique index then delete it:

ALTER IGNORE TABLE mytable ADD UNIQUE INDEX tmpindex (A,B);
ALTER TABLE mytable DROP INDEX tmpindex;

The IGNORE keyword is a mysql extension that makes it drop rows that violate the UNIQUE keyword instead of just failing.

Comments

2
DELETE FROM fruit_table FT1
WHERE EXISTS
(
    SELECT * FROM fruit_table FT2 
    WHERE FT2.fruit_name_column = FT1.fruit_name_column
    AND   FT2.fruit_integer_column = FT1.fruit_integer_column
    AND   FT2.id <> FT1.id
)

This assumes you don't care which of the duplicate records is removed.

1 Comment

Errors for me in MySQL - "You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'FT1"
1
DELETE
FROM mytable
USING mytable, mytable AS vtable
WHERE vtable.id > mytable.id
AND mytable.A = vtable.A
AND mytable.A = vtable.A

1 Comment

Errors for me in MySQL - "You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'tbl USING..."
0

You could use a temporary table with the data you want:

insert into temp_table
select min(id), A, B
 group by A, B

Comments

0

I'm not exactly sure what you're asking here. If you don't want duplicates of the A and B columns, then do just what you mentioned SELECT DISTINCT A, B FROM XXX. Maybe you could post an example of the type of result you would like to see.

1 Comment

I guess "group by" is what I was missing, the other posts have clarified this.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.