SQL checking duplicates in one column and deleting another

Question

I need to delete around 300,000 duplicates in my database. I want to check the Card_id column for duplicates, then check for duplicate timestamps. Then delete one copy and keep one. Example:

| Card_id | Time |    
| 1234    | 5:30 |     
| 1234    | 5:45 |    
| 1234    | 5:30 |    
| 1234    | 5:45 |

So remaining data would be:

| Card_id | Time |     
| 1234    | 5:30 |     
| 1234    | 5:45 |

I have tried several different delete statements, and merging into a new table but with no luck.

UPDATE: Got it working!

Alright after many failures I got this to work for DB2.

delete from(
select card_id, time, row_number() over (partition by card_id, time)  rn
from card_table) as A
where rn > 1

rn increments when there are duplicates for card_id and time. The duplicated, or second rn, will be deleted.

Are we dealing strictly with duplicates or can you have three (or more) rows with the same? — PM 77-1
– PM 77-1, Commented Jul 31, 2013 at 19:22
There can be duplicates from the Card_id, but they must have unique a Time. There are about 34 other columns that I need to keep too. — Nexus
– Nexus, Commented Jul 31, 2013 at 19:27

Gordon Linoff · Accepted Answer · 2013-07-31 19:24:16Z

2

I strongly suggest you take this approach:

create temporary table tokeep as
    select distinct card_id, time
    from t;

truncate table t;

insert into t(card_id, time)
    select *
    from tokeep;

That is, store the data you want. Truncate the table, and then regenerate it. By truncating the table, you get to keep triggers and permissions and other things linked to the table.

This approach should also be faster than deleting many, many duplicates.

If you are going to do that, you ought to insert a proper id as well:

create temporary table tokeep as
    select distinct card_id, time
    from t;

truncate table t;

alter table t add column id int auto_increment;

insert into t(card_id, time)
    select *
    from tokeep;

answered Jul 31, 2013 at 19:24

Gordon Linoff

1.3m62 gold badges706 silver badges857 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Robert · Accepted Answer · 2013-07-31 19:32:48Z

0

If you haven't Primary key or Candidate key probably there is no option using only one command. Try solution below.

Create table with duplicates

  select Card_id,Time
  into COPY_YourTable
  from YourTable
  group by Card_id,Time
  having count(1)>1

Remove duplicates using COPY_YourTable

  delete from YourTable
  where exists 
   (
     select 1
     from COPY_YourTable c
     where  c.Card_id = YourTable.Card_id
     and c.Time = YourTable.Time
   )

Copy data without duplicates

   insert into YourTable
   select Card_id,Time
   from COPY_YourTabl

edited Jul 31, 2013 at 19:32

answered Jul 31, 2013 at 19:27

Robert

25.8k8 gold badges70 silver badges86 bronze badges

Collectives™ on Stack Overflow

SQL checking duplicates in one column and deleting another

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related