Find duplicates in a database table, where 2 columns are duplicated

Question

I have a database table with 3 columns. I want to find all duplicates that have snuck in un-noticed and tidy them up.

Table is structured approximately

ID      ColumnA     ColumnB
0       aaa         bbb
1       aaa         ccc
2       aaa         bbb
3       xxx         bbb

So what would my query look like to return columns 0 and 2 as both column A and column B make a combined duplicate entry?

Standard sql preferred, but is running on a SQL 2008 server

I guess you want to delete them? Just use ROW_NUMBER(). Good example here: stackoverflow.com/questions/15053693/… — Evaldas Buinauskas
– Evaldas Buinauskas, Commented Mar 17, 2015 at 10:45
And when finished, add a unique constraint on (ColumnA, ColumnB) to your table... — jarlh
– jarlh, Commented Mar 17, 2015 at 11:17
Yes, return 0 and 2 to start with. For review as the real life table is slightly more complex than the example — t0mmyw
– t0mmyw, Commented Mar 17, 2015 at 13:39

Tanner · Accepted Answer · 2015-03-17 10:48:11Z

You can create a query that groups and counts the duplicate rows:

SELECT  COUNT(1) , ColumnA , ColumnB
FROM    YourTable
GROUP BY ColumnA , ColumnB
HAVING  COUNT(1) > 1

You can then add this to a subquery to output the full rows that hold the duplicate data.

Here's a full executable example based on your sample data:

CREATE TABLE #YourTable
    ([ID] INT, [ColumnA] VARCHAR(3), [ColumnB] VARCHAR(3))
;

INSERT INTO #YourTable
    ([ID], [ColumnA], [ColumnB])
VALUES
    (0, 'aaa', 'bbb'),
    (1, 'aaa', 'ccc'),
    (2, 'aaa', 'bbb'),
    (3, 'xxx', 'bbb')
;

SELECT  *
FROM    #YourTable t1
WHERE   EXISTS ( SELECT COUNT(1) , ColumnA , ColumnB
                 FROM   #YourTable
                 WHERE  t1.ColumnA = ColumnA AND t1.ColumnB = ColumnB
                 GROUP BY ColumnA , ColumnB
                 HAVING COUNT(1) > 1 )

DROP TABLE #YourTable

Gordon Linoff · Accepted Answer · 2015-03-17 10:46:59Z

2

Use count(*) as a window function:

select t.*
from (select t.*, count(*) over (partition by columna, columnb) as cnt
      from table t
     ) t
where cnt > 1;

answered Mar 17, 2015 at 10:46

Gordon Linoff

1.3m62 gold badges706 silver badges857 bronze badges

Comments

Rahul Tripathi · Accepted Answer · 2015-03-17 10:41:30Z

0

You may try like this:

with x as   (select  *,rn = row_number()
            over(PARTITION BY columnA,columnB order by ID)
            from    #temp1)

select * from x where rn > 1

answered Mar 17, 2015 at 10:41

Rahul Tripathi

173k33 gold badges292 silver badges341 bronze badges

Comments

Rhys Jones · Accepted Answer · 2015-03-17 10:44:47Z

0

You can use a sub-select with a HAVING clause to find duplicated ColumnA-ColumnB pairs, then the outer SELECT just returns the matching rows.

select * from MyTable t1
inner join (select ColumnA, ColumnB 
            from MyTable 
            group by ColumnA, ColumnB 
            having count(*) > 1) t2 on t2.ColumnA = t1.ColumnA 
                                   and t2.ColumnB = t1.ColumnB

answered Mar 17, 2015 at 10:44

Rhys Jones

5,5361 gold badge25 silver badges44 bronze badges

Comments

Dmitrij Kultasev · Accepted Answer · 2015-03-17 11:06:53Z

0

The code without aggregated functions:

SELECT
    a.*
FROM
    #tbl a
    JOIN #tbl b ON a.[ColumnA] = b.[ColumnA]
                   AND a.[ColumnB] = b.[ColumnB]
                   AND a.id <> b.id

OR

SELECT
    a.*
FROM
    #tbl a
WHERE
    EXISTS ( SELECT
                *
             FROM
                #tbl b
             WHERE
                a.[ColumnA] = b.[ColumnA]
                AND a.[ColumnB] = b.[ColumnB]
                AND a.ID <> b.ID )

OR

SELECT * FROM (
SELECT
    a.*, COUNT(*) OVER (PARTITION BY [ColumnA], [ColumnB]) cnt
FROM
    #tbl a
) a
WHERE cnt > 1

answered Mar 17, 2015 at 11:06

Dmitrij Kultasev

5,8658 gold badges53 silver badges108 bronze badges

Comments

Felype · Accepted Answer · 2015-03-17 12:27:54Z

-1

This approach can be controversial and many people can claim its "a bad practice" but, it does perfectly translate "Pick all duplicate stuff from 'table'" Of course, it also works with delete statement.

SELECT FROM mytable WHERE Id NOT IN 
    (SELECT Id FROM 
        (SELECT Id, concat(ColumnA,'-',ColumnB) AS x FROM mytable
            GROUP BY x) AS innerTable);

Also you could occasionally if possible (or necessary) add a unique index on those columns.

ALTER TABLE mytable
ADD CONSTRAINT uniqueColA_ColB UNIQUE (ColumnA,ColumnB);

And Sql will automatically throw an error when trying to insert duplicate values.

edited Mar 17, 2015 at 12:27

answered Mar 17, 2015 at 12:10

Felype

3,1362 gold badges27 silver badges36 bronze badges

2 Comments

Rhys Jones Over a year ago

That SELECT statement seems to be incomplete for any dialect of SQL I recognise.

Felype Over a year ago

The inner select picks all unique first entries where column A and column B repeats them selves, the outter select picks whatever is NOT IN that list, I had to use 2 inner selects because, you cant reference the same table as outer select on the first level inner select.

Collectives™ on Stack Overflow

Find duplicates in a database table, where 2 columns are duplicated

6 Answers 6

Comments

Comments

Comments

Comments

Comments

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

6 Answers 6

Comments

Comments

Comments

Comments

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related