152

So I want to do something like this sql code below:

select s.id, s.name,s.city 
from stuff s
group by s.name having count(where city and name are identical) > 1

To produce the following, (but ignore where only name or only city match, it has to be on both columns):

id      name  city   
904834  jim   London  
904835  jim   London  
90145   Fred  Paris   
90132   Fred  Paris
90133   Fred  Paris

10 Answers 10

194

Duplicated id for pairs name and city:

select s.id, t.* 
from [stuff] s
join (
    select name, city, count(*) as qty
    from [stuff]
    group by name, city
    having count(*) > 1
) t on s.name = t.name and s.city = t.city
Sign up to request clarification or add additional context in comments.

4 Comments

Note that if either name or city contain null, then they will fail to be reported in the outer query, but will be matched in the inner query.
If the values can possibly contain null then (unless I'm missing something) you need to change it to a CROSS JOIN (full Cartesian product) and then add a WHERE clause such as: WHERE ((s.name = t.name) OR (s.name is null and t.name is null)) AND ((s.city = t.city) OR (s.city is null and t.city is null))
This answer will not return unique IDs of each duplicated record. Instead, it will merge the duplicated records, into a single record, and choose whichever ID appears first in the table. I believe the answer by @ssarabando is a more appropriate answer.
@Crayons Both answers will return the same results. See dbfiddle.uk/… and dbfiddle.uk/…
122
 SELECT name, city, count(*) as qty 
 FROM stuff 
 GROUP BY name, city HAVING count(*)> 1

5 Comments

With that, you can't know the id of each line.
Replace SELECT name, city, count(*) as qty with SELECT * to see all columns, including id.
@yoyo Your suggestion gives an error; please suggest the entire SQL which does not give an error.
To also see (at least) the min/max values of ID: SELECT max(id), min(id), name, city, count(*) as qty FROM stuff GROUP BY name, city HAVING count(*)> 1
This answer doesn't address the question, in that it does not return the unique ID.
34

Something like this will do the trick. Don't know about performance, so do make some tests.

select
  id, name, city
from
  [stuff] s
where
1 < (select count(*) from [stuff] i where i.city = s.city and i.name = s.name)

4 Comments

This is an under-rated answer and I believe it's the best one here. This answer identifies duplicates, while returning individual records and their unique ID's. The marked answer groups the results, meaning you cannot actually identify the duplicates by their unique ID's, and is therefore a less useful dataset.
@Crayons Both answers will return the same results. See dbfiddle.uk/… and dbfiddle.uk/…
I am wondering, does this query always return ordered results? When I run this on my database (select city, name, id) I always get the results ordered based on city even though I do not specify it to order.
@NiekJonkman AFAIK ordering without an order by clause is "arbitrary" -- as in it'll depend on the query plan chosen by SQL Server. It can be using a clustered key, or an index, etc. This arcticle by Brent Ozar explains it nicely IMO: brentozar.com/archive/2020/04/…
11

Using count(*) over(partition by...) provides a simple and efficient means to locate unwanted repetition, whilst also list all affected rows and all wanted columns:

SELECT
    t.*
FROM (
    SELECT
        s.*
      , COUNT(*) OVER (PARTITION BY s.name, s.city) AS qty
    FROM stuff s
    ) t
WHERE t.qty > 1
ORDER BY t.name, t.city

While most recent RDBMS versions support count(*) over(partition by...) MySQL V 8.0 introduced "window functions", as seen below (in MySQL 8.0)

CREATE TABLE stuff(
   id   INTEGER  NOT NULL
  ,name VARCHAR(60) NOT NULL
  ,city VARCHAR(60) NOT NULL
);
INSERT INTO stuff(id,name,city) VALUES 
  (904834,'jim','London')
, (904835,'jim','London')
, (90145,'Fred','Paris')
, (90132,'Fred','Paris')
, (90133,'Fred','Paris')

, (923457,'Barney','New York') # not expected in result
;
SELECT
    t.*
FROM (
    SELECT
        s.*
      , COUNT(*) OVER (PARTITION BY s.name, s.city) AS qty
    FROM stuff s
    ) t
WHERE t.qty > 1
ORDER BY t.name, t.city
    id | name | city   | qty
-----: | :--- | :----- | --:
 90145 | Fred | Paris  |   3
 90132 | Fred | Paris  |   3
 90133 | Fred | Paris  |   3
904834 | jim  | London |   2
904835 | jim  | London |   2

db<>fiddle here

Window functions. MySQL now supports window functions that, for each row from a query, perform a calculation using rows related to that row. These include functions such as RANK(), LAG(), and NTILE(). In addition, several existing aggregate functions now can be used as window functions; for example, SUM() and AVG(). For more information, see Section 12.21, “Window Functions”.

1 Comment

This is an extremely good solution, thanks.
7

I found this way to be pretty flexible / efficient

select 
    s1.id
    ,s1.name
    ,s1.city 
from 
    stuff s1
    ,stuff s2
Where
    s1.id <> s2.id
    and s1.name = s2.name
    and s1.city = s2.city

1 Comment

select distinct ... might be what's needed/missing here, no?
7
SELECT Feild1, Feild2, COUNT(*)
FROM table name
GROUP BY Feild1, Feild2
HAVING COUNT(*)>1

This will give you all your answers.

Comments

2

You have to self join stuff and match name and city. Then group by count.

select 
   s.id, s.name, s.city 
from stuff s join stuff p ON (
   s.name = p.city OR s.city = p.name
)
group by s.name having count(s.name) > 1

1 Comment

Fails in SQL Server: all non-aggregate columns must be in the GROUP BY
2

From OP question, OP wants to group columns and get additional columns that aren't grouping columns.

so that regular group by + having might not be worked.

I would use EXISTS subquery with HAVING.

we can try to add columns which you want to mark duplicate in a subquery.

SELECT s.id, s.name,s.city 
FROM stuff s
WHERE EXISTS (
   SELECT 1
   FROM stuff ss
   WHERE 
      s.name = ss.name
   AND
      s.city = ss.city
   GROUP BY ss.name,ss.city
   HAVING COUNT(*) > 1
)

If we create a suitable Index might get better performance than join

CREATE INDEX IX_name ON stuff (
    name,
    city
);

Another way we can use COUNT window function with filter condition to make it which add grouping columns in PARTITION BY part

SELECT s.id, s.name,s.city 
FROM (
   SELECT *,COUNT(*) OVER(PARTITION BY name,city) cnt
   FROM stuff 
) s
WHERE cnt > 1

sqlfiddle

Comments

2

It's a pleasure to add another way of achieving the required output using Cross Apply here like below :

select s.* from stuff s
cross apply(
    select name, city from stuff
    group by name, city
    having Count(*) > 1) x
where s.name = x.name and s.city=x.city

Comments

-1

Given a staging table with 70 columns and only 4 representing duplicates, this code will return the offending columns:

SELECT 
    COUNT(*)
    ,LTRIM(RTRIM(S.TransactionDate)) 
    ,LTRIM(RTRIM(S.TransactionTime))
    ,LTRIM(RTRIM(S.TransactionTicketNumber)) 
    ,LTRIM(RTRIM(GrossCost)) 
FROM Staging.dbo.Stage S
GROUP BY 
    LTRIM(RTRIM(S.TransactionDate)) 
    ,LTRIM(RTRIM(S.TransactionTime))
    ,LTRIM(RTRIM(S.TransactionTicketNumber)) 
    ,LTRIM(RTRIM(GrossCost)) 
HAVING COUNT(*) > 1

.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.