How do I find duplicates across multiple columns?

Question

So I want to do something like this sql code below:

select s.id, s.name,s.city 
from stuff s
group by s.name having count(where city and name are identical) > 1

To produce the following, (but ignore where only name or only city match, it has to be on both columns):

id      name  city   
904834  jim   London  
904835  jim   London  
90145   Fred  Paris   
90132   Fred  Paris
90133   Fred  Paris

Michał Powaga · Accepted Answer · 2011-11-16 09:26:21Z

194

Duplicated id for pairs name and city:

select s.id, t.* 
from [stuff] s
join (
    select name, city, count(*) as qty
    from [stuff]
    group by name, city
    having count(*) > 1
) t on s.name = t.name and s.city = t.city

answered Nov 16, 2011 at 9:26

Michał Powaga

23.4k8 gold badges55 silver badges62 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Adam Parkin Over a year ago

Note that if either name or city contain null, then they will fail to be reported in the outer query, but will be matched in the inner query.

Adam Parkin Over a year ago

If the values can possibly contain null then (unless I'm missing something) you need to change it to a CROSS JOIN (full Cartesian product) and then add a WHERE clause such as:

WHERE ((s.name = t.name) OR (s.name is null and t.name is null)) AND ((s.city = t.city) OR (s.city is null and t.city is null))

Crayons Over a year ago

This answer will not return unique IDs of each duplicated record. Instead, it will merge the duplicated records, into a single record, and choose whichever ID appears first in the table. I believe the answer by @ssarabando is a more appropriate answer.

Jack B Over a year ago

@Crayons Both answers will return the same results. See dbfiddle.uk/… and dbfiddle.uk/…

Plamen G · Accepted Answer · 2015-05-07 19:04:20Z

122

 SELECT name, city, count(*) as qty 
 FROM stuff 
 GROUP BY name, city HAVING count(*)> 1

edited May 7, 2015 at 19:04

Plamen G

4,7594 gold badges36 silver badges44 bronze badges

answered May 7, 2015 at 18:59

Sunnny

1,2531 gold badge8 silver badges3 bronze badges

5 Comments

Juan.Queiroz Over a year ago

With that, you can't know the id of each line.

yoyo Over a year ago

Replace SELECT name, city, count(*) as qty with SELECT * to see all columns, including id.

nutty about natty Over a year ago

@yoyo Your suggestion gives an error; please suggest the entire SQL which does not give an error.

nutty about natty Over a year ago

To also see (at least) the min/max values of ID: SELECT max(id), min(id), name, city, count(*) as qty FROM stuff GROUP BY name, city HAVING count(*)> 1

Crayons Over a year ago

This answer doesn't address the question, in that it does not return the unique ID.

ssarabando · Accepted Answer · 2011-11-16 09:25:24Z

34

Something like this will do the trick. Don't know about performance, so do make some tests.

select
  id, name, city
from
  [stuff] s
where
1 < (select count(*) from [stuff] i where i.city = s.city and i.name = s.name)

answered Nov 16, 2011 at 9:25

ssarabando

3,5272 gold badges40 silver badges44 bronze badges

4 Comments

Crayons Over a year ago

This is an under-rated answer and I believe it's the best one here. This answer identifies duplicates, while returning individual records and their unique ID's. The marked answer groups the results, meaning you cannot actually identify the duplicates by their unique ID's, and is therefore a less useful dataset.

Jack B Over a year ago

@Crayons Both answers will return the same results. See dbfiddle.uk/… and dbfiddle.uk/…

Niek Jonkman Over a year ago

I am wondering, does this query always return ordered results? When I run this on my database (select city, name, id) I always get the results ordered based on city even though I do not specify it to order.

ssarabando Over a year ago

@NiekJonkman AFAIK ordering without an order by clause is "arbitrary" -- as in it'll depend on the query plan chosen by SQL Server. It can be using a clustered key, or an index, etc. This arcticle by Brent Ozar explains it nicely IMO: brentozar.com/archive/2020/04/…

Community · Accepted Answer · 2020-06-20 09:12:55Z

Using count(*) over(partition by...) provides a simple and efficient means to locate unwanted repetition, whilst also list all affected rows and all wanted columns:

SELECT
    t.*
FROM (
    SELECT
        s.*
      , COUNT(*) OVER (PARTITION BY s.name, s.city) AS qty
    FROM stuff s
    ) t
WHERE t.qty > 1
ORDER BY t.name, t.city

While most recent RDBMS versions support count(*) over(partition by...) MySQL V 8.0 introduced "window functions", as seen below (in MySQL 8.0)

CREATE TABLE stuff(
   id   INTEGER  NOT NULL
  ,name VARCHAR(60) NOT NULL
  ,city VARCHAR(60) NOT NULL
);

INSERT INTO stuff(id,name,city) VALUES 
  (904834,'jim','London')
, (904835,'jim','London')
, (90145,'Fred','Paris')
, (90132,'Fred','Paris')
, (90133,'Fred','Paris')

, (923457,'Barney','New York') # not expected in result
;

SELECT
    t.*
FROM (
    SELECT
        s.*
      , COUNT(*) OVER (PARTITION BY s.name, s.city) AS qty
    FROM stuff s
    ) t
WHERE t.qty > 1
ORDER BY t.name, t.city

    id | name | city   | qty
-----: | :--- | :----- | --:
 90145 | Fred | Paris  |   3
 90132 | Fred | Paris  |   3
 90133 | Fred | Paris  |   3
904834 | jim  | London |   2
904835 | jim  | London |   2

db<>fiddle here

Window functions. MySQL now supports window functions that, for each row from a query, perform a calculation using rows related to that row. These include functions such as RANK(), LAG(), and NTILE(). In addition, several existing aggregate functions now can be used as window functions; for example, SUM() and AVG(). For more information, see Section 12.21, “Window Functions”.

TylerH · Accepted Answer · 2023-02-21 22:45:50Z

7

I found this way to be pretty flexible / efficient

select 
    s1.id
    ,s1.name
    ,s1.city 
from 
    stuff s1
    ,stuff s2
Where
    s1.id <> s2.id
    and s1.name = s2.name
    and s1.city = s2.city

edited Feb 21, 2023 at 22:45

TylerH

21.3k84 gold badges84 silver badges121 bronze badges

answered Apr 9, 2019 at 17:53

MattD

711 silver badge1 bronze badge

1 Comment

nutty about natty Over a year ago

select distinct ... might be what's needed/missing here, no?

TylerH · Accepted Answer · 2023-02-21 22:46:01Z

7

SELECT Feild1, Feild2, COUNT(*)
FROM table name
GROUP BY Feild1, Feild2
HAVING COUNT(*)>1

This will give you all your answers.

edited Feb 21, 2023 at 22:46

TylerH

21.3k84 gold badges84 silver badges121 bronze badges

answered Apr 30, 2021 at 9:31

Arunav dutta gupta

1091 silver badge5 bronze badges

Comments

Anja · Accepted Answer · 2011-11-16 09:22:08Z

2

You have to self join stuff and match name and city. Then group by count.

select 
   s.id, s.name, s.city 
from stuff s join stuff p ON (
   s.name = p.city OR s.city = p.name
)
group by s.name having count(s.name) > 1

answered Nov 16, 2011 at 9:22

Anja

3241 silver badge2 bronze badges

1 Comment

gbn Over a year ago

Fails in SQL Server: all non-aggregate columns must be in the GROUP BY

D-Shih · Accepted Answer · 2022-05-11 07:28:40Z

From OP question, OP wants to group columns and get additional columns that aren't grouping columns.

so that regular group by + having might not be worked.

I would use EXISTS subquery with HAVING.

we can try to add columns which you want to mark duplicate in a subquery.

SELECT s.id, s.name,s.city 
FROM stuff s
WHERE EXISTS (
   SELECT 1
   FROM stuff ss
   WHERE 
      s.name = ss.name
   AND
      s.city = ss.city
   GROUP BY ss.name,ss.city
   HAVING COUNT(*) > 1
)

If we create a suitable Index might get better performance than join

CREATE INDEX IX_name ON stuff (
    name,
    city
);

Another way we can use COUNT window function with filter condition to make it which add grouping columns in PARTITION BY part

SELECT s.id, s.name,s.city 
FROM (
   SELECT *,COUNT(*) OVER(PARTITION BY name,city) cnt
   FROM stuff 
) s
WHERE cnt > 1

sqlfiddle

Md. Suman Kabir · Accepted Answer · 2022-08-26 19:13:51Z

2

It's a pleasure to add another way of achieving the required output using Cross Apply here like below :

select s.* from stuff s
cross apply(
    select name, city from stuff
    group by name, city
    having Count(*) > 1) x
where s.name = x.name and s.city=x.city

answered Aug 26, 2022 at 19:13

Md. Suman Kabir

5,4515 gold badges29 silver badges45 bronze badges

Comments

Georges Legros · Accepted Answer · 2017-11-14 15:18:41Z

-1

Given a staging table with 70 columns and only 4 representing duplicates, this code will return the offending columns:

SELECT 
    COUNT(*)
    ,LTRIM(RTRIM(S.TransactionDate)) 
    ,LTRIM(RTRIM(S.TransactionTime))
    ,LTRIM(RTRIM(S.TransactionTicketNumber)) 
    ,LTRIM(RTRIM(GrossCost)) 
FROM Staging.dbo.Stage S
GROUP BY 
    LTRIM(RTRIM(S.TransactionDate)) 
    ,LTRIM(RTRIM(S.TransactionTime))
    ,LTRIM(RTRIM(S.TransactionTicketNumber)) 
    ,LTRIM(RTRIM(GrossCost)) 
HAVING COUNT(*) > 1

.

edited Nov 14, 2017 at 15:18

Georges Legros

2,5143 gold badges27 silver badges44 bronze badges

answered Nov 14, 2017 at 13:30

Don G.

1

Collectives™ on Stack Overflow

How do I find duplicates across multiple columns?

10 Answers 10

4 Comments

5 Comments

4 Comments

1 Comment

1 Comment

Comments

1 Comment

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

10 Answers 10

4 Comments

5 Comments

4 Comments

1 Comment

1 Comment

Comments

1 Comment

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related