Delete multiple rows without knowing the names of the rows

Question

I have a huge database that contains writer names.

There are multiple records in my database but I don't know which rows are duplicate.

How can I delete duplicate rows without knowing the value?

Brian DeMilia · Accepted Answer · 2014-12-21 14:02:32Z

1

Try:

delete from tbl
 where writer_id in
       (select writer_id
          from (select * from tbl) t
         where exists (select 1
                  from (select * from tbl) x
                 where x.writer_name = t.writer_name
                   and t.writer_id < x.writer_id));

See demo: http://sqlfiddle.com/#!2/845ca3/1/0

This keeps the first row for each writer_name, in order of writer_id ascending.

The EXISTS subquery will run for every row, however. You could also try:

delete t
from
   tbl t
left join ( select writer_name, min(writer_id) as writer_id
            from tbl
            group by writer_name ) x
                                     on t .writer_name = x.writer_name
                                    and x.writer_id = t .writer_id
where
   x.writer_name is null;

demo: http://sqlfiddle.com/#!2/075f9/1/0

If there are no foreign key constraints on the table you could also use create table as select to create a new table without the duplicate entries, drop the old table, and rename the new table to that of the old table's name, getting what you want in the end. (This would not be the way to go if this table has foreign keys though)

That would look like this:

create table tbl2 as (select distinct writer_name from tbl);

drop table tbl;

alter table tbl2 add column writer_id int not null auto_increment first,
add primary key (writer_id);

rename table tbl2 to tbl;

demo: http://sqlfiddle.com/#!2/8886d/1/0

edited Dec 21, 2014 at 14:02

answered Dec 21, 2014 at 2:38

Brian DeMilia

13.3k1 gold badge25 silver badges37 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

salep Over a year ago

I am working with one table and I want to delete multiple records in a particular column (writer_name) in this table. Does it make my question clearer?

Brian DeMilia Over a year ago

What other columns are on the table? and what determines which row to keep, of those where the writer appears again 2,3,4... times?

salep Over a year ago

i.imgur.com/hjtxNyo.png See, Franz Kafka exists twice. I want to delete the second(or third, fourth, depend on the situation) values.

Brian DeMilia Over a year ago

@salep see my edit, should work if you want to keep each writer_name with the lowest writer_id in situations where the same name has 2+ IDs

salep Over a year ago

Update : It works well with small tables, but I can't make it work with a big table that has 50,000+ rows. It takes forever to run the query. What could be the reason for this?

|

Kritner · Accepted Answer · 2014-12-21 02:03:58Z

0

SELECT a.*
FROM the_table a
INNER JOIN the_table b ON a.field1 = b.field1 AND (etc)
WHERE a.pk != b.pk

hope that query can solve your problem.

edited Dec 21, 2014 at 2:03

Kritner

13.8k11 gold badges53 silver badges76 bronze badges

answered Dec 21, 2014 at 2:01

user3827706

7

1 Comment

salep Over a year ago

I can't make it work. I want to delete rows that contain the same value. Could you edit your answer if it's possible, please?

user3827706 · Accepted Answer · 2014-12-21 03:12:21Z

0

DELETE a 
FROM tbl a 
LEFT JOIN tbl b 
ON a.field1 = b.field1 (etc)
WHERE a.id < b.id

this must help you

answered Dec 21, 2014 at 3:12

user3827706

7

Collectives™ on Stack Overflow

Delete multiple rows without knowing the names of the rows

3 Answers 3

6 Comments

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

6 Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related