2

I have a database table with 3 columns. I want to find all duplicates that have snuck in un-noticed and tidy them up.

Table is structured approximately

ID      ColumnA     ColumnB
0       aaa         bbb
1       aaa         ccc
2       aaa         bbb
3       xxx         bbb

So what would my query look like to return columns 0 and 2 as both column A and column B make a combined duplicate entry?

Standard sql preferred, but is running on a SQL 2008 server

4
  • Do you mean "...to return rows 0 and 2..."? Commented Mar 17, 2015 at 10:41
  • I guess you want to delete them? Just use ROW_NUMBER(). Good example here: stackoverflow.com/questions/15053693/… Commented Mar 17, 2015 at 10:45
  • And when finished, add a unique constraint on (ColumnA, ColumnB) to your table... Commented Mar 17, 2015 at 11:17
  • Yes, return 0 and 2 to start with. For review as the real life table is slightly more complex than the example Commented Mar 17, 2015 at 13:39

6 Answers 6

2

You can create a query that groups and counts the duplicate rows:

SELECT  COUNT(1) , ColumnA , ColumnB
FROM    YourTable
GROUP BY ColumnA , ColumnB
HAVING  COUNT(1) > 1

You can then add this to a subquery to output the full rows that hold the duplicate data.

Here's a full executable example based on your sample data:

CREATE TABLE #YourTable
    ([ID] INT, [ColumnA] VARCHAR(3), [ColumnB] VARCHAR(3))
;

INSERT INTO #YourTable
    ([ID], [ColumnA], [ColumnB])
VALUES
    (0, 'aaa', 'bbb'),
    (1, 'aaa', 'ccc'),
    (2, 'aaa', 'bbb'),
    (3, 'xxx', 'bbb')
;

SELECT  *
FROM    #YourTable t1
WHERE   EXISTS ( SELECT COUNT(1) , ColumnA , ColumnB
                 FROM   #YourTable
                 WHERE  t1.ColumnA = ColumnA AND t1.ColumnB = ColumnB
                 GROUP BY ColumnA , ColumnB
                 HAVING COUNT(1) > 1 )

DROP TABLE #YourTable
Sign up to request clarification or add additional context in comments.

Comments

2

Use count(*) as a window function:

select t.*
from (select t.*, count(*) over (partition by columna, columnb) as cnt
      from table t
     ) t
where cnt > 1;

Comments

0

You may try like this:

with x as   (select  *,rn = row_number()
            over(PARTITION BY columnA,columnB order by ID)
            from    #temp1)

select * from x where rn > 1

Comments

0

You can use a sub-select with a HAVING clause to find duplicated ColumnA-ColumnB pairs, then the outer SELECT just returns the matching rows.

select * from MyTable t1
inner join (select ColumnA, ColumnB 
            from MyTable 
            group by ColumnA, ColumnB 
            having count(*) > 1) t2 on t2.ColumnA = t1.ColumnA 
                                   and t2.ColumnB = t1.ColumnB

Comments

0

The code without aggregated functions:

SELECT
    a.*
FROM
    #tbl a
    JOIN #tbl b ON a.[ColumnA] = b.[ColumnA]
                   AND a.[ColumnB] = b.[ColumnB]
                   AND a.id <> b.id

OR

SELECT
    a.*
FROM
    #tbl a
WHERE
    EXISTS ( SELECT
                *
             FROM
                #tbl b
             WHERE
                a.[ColumnA] = b.[ColumnA]
                AND a.[ColumnB] = b.[ColumnB]
                AND a.ID <> b.ID )

OR

SELECT * FROM (
SELECT
    a.*, COUNT(*) OVER (PARTITION BY [ColumnA], [ColumnB]) cnt
FROM
    #tbl a
) a
WHERE cnt > 1

Comments

-1

This approach can be controversial and many people can claim its "a bad practice" but, it does perfectly translate "Pick all duplicate stuff from 'table'" Of course, it also works with delete statement.

SELECT FROM mytable WHERE Id NOT IN 
    (SELECT Id FROM 
        (SELECT Id, concat(ColumnA,'-',ColumnB) AS x FROM mytable
            GROUP BY x) AS innerTable);

Also you could occasionally if possible (or necessary) add a unique index on those columns.

ALTER TABLE mytable
ADD CONSTRAINT uniqueColA_ColB UNIQUE (ColumnA,ColumnB);

And Sql will automatically throw an error when trying to insert duplicate values.

2 Comments

That SELECT statement seems to be incomplete for any dialect of SQL I recognise.
The inner select picks all unique first entries where column A and column B repeats them selves, the outter select picks whatever is NOT IN that list, I had to use 2 inner selects because, you cant reference the same table as outer select on the first level inner select.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.