0

I need to delete around 300,000 duplicates in my database. I want to check the Card_id column for duplicates, then check for duplicate timestamps. Then delete one copy and keep one. Example:

| Card_id | Time |    
| 1234    | 5:30 |     
| 1234    | 5:45 |    
| 1234    | 5:30 |    
| 1234    | 5:45 |

So remaining data would be:

| Card_id | Time |     
| 1234    | 5:30 |     
| 1234    | 5:45 |

I have tried several different delete statements, and merging into a new table but with no luck.

UPDATE: Got it working!

Alright after many failures I got this to work for DB2.

delete from(
select card_id, time, row_number() over (partition by card_id, time)  rn
from card_table) as A
where rn > 1

rn increments when there are duplicates for card_id and time. The duplicated, or second rn, will be deleted.

4
  • 1
    Do you have an id column to identify records uniquely? Commented Jul 31, 2013 at 19:13
  • There is no unique id for this data. Commented Jul 31, 2013 at 19:17
  • Are we dealing strictly with duplicates or can you have three (or more) rows with the same? Commented Jul 31, 2013 at 19:22
  • There can be duplicates from the Card_id, but they must have unique a Time. There are about 34 other columns that I need to keep too. Commented Jul 31, 2013 at 19:27

2 Answers 2

2

I strongly suggest you take this approach:

create temporary table tokeep as
    select distinct card_id, time
    from t;

truncate table t;

insert into t(card_id, time)
    select *
    from tokeep;

That is, store the data you want. Truncate the table, and then regenerate it. By truncating the table, you get to keep triggers and permissions and other things linked to the table.

This approach should also be faster than deleting many, many duplicates.

If you are going to do that, you ought to insert a proper id as well:

create temporary table tokeep as
    select distinct card_id, time
    from t;

truncate table t;

alter table t add column id int auto_increment;

insert into t(card_id, time)
    select *
    from tokeep;
Sign up to request clarification or add additional context in comments.

Comments

0

If you haven't Primary key or Candidate key probably there is no option using only one command. Try solution below.

Create table with duplicates

  select Card_id,Time
  into COPY_YourTable
  from YourTable
  group by Card_id,Time
  having count(1)>1

Remove duplicates using COPY_YourTable

  delete from YourTable
  where exists 
   (
     select 1
     from COPY_YourTable c
     where  c.Card_id = YourTable.Card_id
     and c.Time = YourTable.Time
   )

Copy data without duplicates

   insert into YourTable
   select Card_id,Time
   from COPY_YourTabl

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.