I need to delete around 300,000 duplicates in my database. I want to check the Card_id column for duplicates, then check for duplicate timestamps. Then delete one copy and keep one. Example:
| Card_id | Time |
| 1234 | 5:30 |
| 1234 | 5:45 |
| 1234 | 5:30 |
| 1234 | 5:45 |
So remaining data would be:
| Card_id | Time |
| 1234 | 5:30 |
| 1234 | 5:45 |
I have tried several different delete statements, and merging into a new table but with no luck.
UPDATE: Got it working!
Alright after many failures I got this to work for DB2.
delete from(
select card_id, time, row_number() over (partition by card_id, time) rn
from card_table) as A
where rn > 1
rn increments when there are duplicates for card_id and time. The duplicated, or second rn, will be deleted.
idcolumn to identify records uniquely?