1

I have duplicate records in my table with below 3 scenario:

record      Adddate
22344222    2016-04-22 00:00:00.000
22344222    2016-05-06 00:00:00.000
22344222    2016-06-06 00:00:00.000
22344222    2016-06-20 00:00:00.000
22344222    2016-07-25 00:00:00.000
22344222    2016-09-26 00:00:00.000
22344222    2016-10-03 00:00:00.000
22344222    2016-10-26 00:00:00.000
22344222    2016-10-27 00:00:00.000
22344222    2016-10-28 00:00:00.000

22344223    2016-04-22 00:00:00.000
22344223    2016-04-22 00:00:00.000
22344223    2016-04-22 00:00:00.000
22344223    2016-04-22 00:00:00.000
22344223    2016-04-22 00:00:00.000
22344223    2016-04-22 00:00:00.000
22344223    2016-04-22 00:00:00.000
22344223    2016-04-22 00:00:00.000
22344223    2016-04-22 00:00:00.000
22344223    2016-04-22 00:00:00.000

22344224    2016-04-22 00:00:00.000
22344224    2016-04-23 00:00:00.000
22344224    2016-04-24 00:00:00.000
22344224    2016-04-25 00:00:00.000
22344224    2016-04-26 00:00:00.000
22344224    2016-06-10 00:00:00.000

I want to delete all the duplicate records except for the 2 rows where the 1st row should be the one having least add date and the 2nd row where the date difference between the adddate is having 45 days.

In the above three scenario I should be able to preserve only the below data

record      Adddate
22344222    2016-04-22 00:00:00.000
22344222    2016-05-06 00:00:00.000

22344223    2016-04-22 00:00:00.000

22344224    2016-04-22 00:00:00.000
22344224    2016-06-06 00:00:00.000
7
  • will you format this better, I.E using the code tags and put it into a tabular layout Commented Dec 12, 2016 at 14:16
  • DO you mean "exactly" 45 days, or "at least" 45 days ? Commented Dec 12, 2016 at 14:17
  • exactly 45 days Commented Dec 12, 2016 at 14:18
  • datediff between what? Commented Dec 12, 2016 at 14:21
  • what if there is no record exactly after 45 days? Commented Dec 12, 2016 at 14:23

2 Answers 2

5

try this:

With mad(record, minDat) as 
   (Select record, min(addDate)
    From myTable
    group by record)
Delete t
from mytable t join mad m 
   on m.record = t.Record
where t.adddate not in 
   (m.minDat, dateadd(day, 45, m.minDat))

problem is you have 13 records in the source data for record 22344223 that are all the same.

If you only want one copy of these 13 duplicates, then, after deleting the records,

create table dbo.temp (record integer, addDate date)
Insert dbo.temp(record, addDate)
Select distinct record, addDate
from mytable
-- ------------------------
Drop table myTable
-- ------------------------
exec sp_Rename 'dbo.temp', 'dbo.mytable'   
Sign up to request clarification or add additional context in comments.

2 Comments

in that scenario only the 1st row need to be preserved
How is the computer to delete only the first row? How does it know which is first? the rows are not ordered. If there was a column Primary Key, or a date entered, then easy. When the rows are the same there is no way to do this directly with a single standard SQL statement. the only way is to create a new table with only distinct values from this one, drop this one and rename the new one.
0

I used a table name of Test since it was not provided.

WITH cte AS
(SELECT   *, NumSeq = ROW_NUMBER() OVER (PARTITION BY record ORDER BY Adddate) FROM   dbo.Test)
DELETE  FROM cte
WHERE cte.NumSeq > 1
    AND NOT EXISTS ( SELECT 1
                       FROM cte AS A
                       INNER JOIN cte AS B
                                       ON B.NumSeq > 1
                                          AND   DATEDIFF( DAY, A.Adddate, B.Adddate ) = 45
                                          AND   A.record = B.record
                       WHERE A.NumSeq = 1
                             AND cte.record = B.record
                             AND cte.Adddate = B.Adddate
                   );

SELECT  * FROM  dbo.Test;

This returns 4 rows:

22344222 2016-04-22 00:00:00.000

22344222 2016-06-06 00:00:00.000

22344223 2016-04-22 00:00:00.000

22344224 2016-04-22 00:00:00.000

Edit: Note the desired results shows 5/6 as the 2nd row for 22344222, but that is not 45 days after 4/22. My result returns the row for 6/6. Also, if I add the 6/6 date for 22344224 to the source data, my result will return 5 rows instead.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.