How Do You Delete Duplicate Records In SQL

Question

How to delete duplicate records in sql?

How did you get duplicates in the first place? Don't you have any uniqueness constraints? What about a PK? What is your definition of "duplicate"? — Mark Byers
– Mark Byers, Commented Feb 3, 2010 at 12:39
To give you the benefit of the doubt I am assuming that you have primary keys setup, but the nature of the data allows for duplicates like a table of names. But, depending on the data there are different techniques that you can use to find duplicates. So, the first thing we will need is the structure of the table including primary key and constraints, and the kind of data including what you consider as duplicates — Waleed Al-Balooshi
– Waleed Al-Balooshi, Commented Feb 3, 2010 at 13:16

Dave Markle · Accepted Answer · 2010-02-03 13:39:56Z

7

In SQL Server 2005 and above:

WITH    q AS
        (
        SELECT  *, ROW_NUMBER() OVER (PARTITION BY dup_column ORDER BY dup_column) AS rn
        FROM    mytable
        )
DELETE
FROM    q
WHERE   rn > 1

edited Feb 3, 2010 at 13:39

Dave Markle

98.3k20 gold badges152 silver badges172 bronze badges

answered Feb 3, 2010 at 12:57

Quassnoi

427k94 gold badges628 silver badges623 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

KM. Over a year ago

great blog article about this: explainextended.com/2009/03/14/deleting-duplicates

Dave Markle Over a year ago

huh. never thought this would be allowed... but lo and behold, it works. +1

APC · Accepted Answer · 2010-02-03 13:02:51Z

Here is how to do it in Oracle, using ROWID. Different flavours of RDBMS will have their own equivalent.

I start by creating some duplicate records ...

SQL> select t, count(*) from t23 group by t;

T       COUNT(*)
----- ----------
09:00          2
12:00          2
10:30          2
11:00          2
12:30          2
08:00          2
10:45          2
11:15          2

8 rows selected.

SQL>

... and now I zap them, using T to define "duplicate records"...

SQL> delete from t23
  2  where rowid > ( select min(rowid) from t23 x
  3                  where x.t = t23.t )
  4  /

8 rows deleted.

SQL> select t, count(*) from t23 group by t;

T       COUNT(*)
----- ----------
09:00          1
12:00          1
10:30          1
11:00          1
12:30          1
08:00          1
10:45          1
11:15          1

8 rows selected.

SQL>

Note that in the sub-query you have to include as many columns as necessary to specify what constitutes uniquenss. This could end up being the whole record, although one would hope not.

Incidentally, the most efficient way of doing this is not to have duplicate records in the first place. Which is why Nature gave us primary keys and unique constraints.

Dave Markle · Accepted Answer · 2017-05-28 00:28:47Z

3

Since you don't have a key on the table (assuming your rows are 100% duplicated), you won't have any problems with other tables referencing your table with a FOREIGN KEY.

The fastest and least complicated way of doing this is:

SELECT DISTINCT *
INTO #tmp
FROM YourTable;

TRUNCATE TABLE YourTable;

INSERT YourTable
SELECT * from #tmp;

Maybe consider adding some version of this statement to the end ;-)

ALTER YourTable ADD CONSTRAINT PK_YourTable PRIMARY KEY (whatever, keeps, this, from, happening, again);

edited May 28, 2017 at 0:28

answered Feb 3, 2010 at 12:54

Dave Markle

98.3k20 gold badges152 silver badges172 bronze badges

Comments

Matt · Accepted Answer · 2016-10-12 12:49:28Z

0

In Postgresql, I use the following:

DELETE FROM table_name q
WHERE EXISTS (
   SELECT 1
   FROM   table_name q1
   WHERE  q1.ctid < q.ctid
   AND    q.fid = q1.fid
   );

Where the fid is your unique id number, or primary key.

answered Oct 12, 2016 at 12:49

Matt

1211 silver badge5 bronze badges

Comments

Jay · Accepted Answer · 2017-01-17 02:52:29Z

0

In MS SQL,

DELETE Table1 FROM Table1 
INNER JOIN ( 
   SELECT MAX(lineitem) AS lineitem, ID, COUNT (ID) AS IDCount 
   FROM Table1 
   GROUP BY ID HAVING COUNT (ID) > 1) AS Table2 
ON Table1.ID = Table2.ID and Table1.LineItem = Table2.lineitem

Above SQL will find all duplicate ID and delete the one with maximum LineItem.

ID         LineItem
---        --------
111        1
111        2 (Deleted)
222        1
222        2 (Deleted)

answered Jan 17, 2017 at 2:52

Jay

11 bronze badge

Comments

Vinod L · Accepted Answer · 2017-12-14 12:04:49Z

0

In Oracle, We can do it using many ways.

1) By Creating a New Table :

create table emp2 as
select distinct * from EMP;
drop table emp;
alter table emp2 rename to emp;

2) By Using RowID :

delete from EMP where rowid not in (
select max(rowid) from EMP group by EMPNO, EMPNAME, SALARY);

3) By Using Self Join and RowID :

delete from emp e1 where rowid not in(
select max(rowid) from EMP e2
where e1.EMPNO = e2.EMPNO);

answered Dec 14, 2017 at 12:04

Vinod L

173 bronze badges

Comments

Philluminati · Accepted Answer · 2010-02-03 12:51:33Z

-2

select col from table;

select distinct col from table;

answered Feb 3, 2010 at 12:51

Philluminati

2,8192 gold badges27 silver badges36 bronze badges

1 Comment

smci Over a year ago

He said delete the duplicates, not just avoid selecting them.

Collectives™ on Stack Overflow

How Do You Delete Duplicate Records In SQL

7 Answers 7

2 Comments

Comments

Comments

Comments

Comments

Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

7 Answers 7

2 Comments

Comments

Comments

Comments

Comments

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related