In SQL, find duplicates in one column with unique values for another column

Question

So I have a table of aliases linked to record ids. I need to find duplicate aliases with unique record ids. To explain better:

ID    Alias     Record ID
1     000123    4
2     000123    4
3     000234    4
4     000123    6
5     000345    6
6     000345    7

The result of a query on this table should be something to the effect of

000123    4    6
000345    6    7

Indicating that both record 4 and 6 have an alias of 000123 and both record 6 and 7 have an alias of 000345.

I was looking into using GROUP BY but if I group by alias then I can't select record id and if I group by both alias and record id it will only return the first two rows in this example where both columns are duplicates. The only solution I've found, and it's a terrible one that crashed my server, is to do two different selects for all the data and then join them

ON [T_1].[ALIAS] = [T_2].[ALIAS] AND NOT [T_1].[RECORD_ID] = [T_2].[RECORD_ID]

Are there any solutions out there that would work better? As in, not crash my server when run on a few hundred thousand records?

I hear "pivot" calling

Aniket Inge
– Aniket Inge

2013-07-08 22:17:53 +00:00
Commented Jul 8, 2013 at 22:17 — Aniket Inge
– Aniket Inge, Commented Jul 8, 2013 at 22:17

Ann L. · Accepted Answer · 2022-07-11 13:33:43Z

13

It looks as if you have two requirements:

Identify all aliases that have more than one record id, and
List the record ids for these aliases horizontally.

The first is a lot easier to do than the second. Here's some SQL that ought to get you where you want with the first:

WITH A   -- Get a list of unique combinations of Alias and [Record ID]
AS  (
   SELECT Distinct
          Alias
     ,    [Record ID]
   FROM  T1
)
,   B  -- Get a list of all those Alias values that have more than one [Record ID] associated
AS  (
    SELECT Alias
    FROM   A
    GROUP BY
           Alias
    HAVING COUNT(*) > 1
)
SELECT  A.Alias
    ,   A.[Record ID]
FROM    A
    JOIN B
        ON  A.Alias = B.Alias

Now, as for the second. If you're satisfied with the data in this form:

Alias     Record ID
000123    4
000123    6
000345    6
000345    7

... you can stop there. Otherwise, things get tricky.

The PIVOT command will not necessarily help you, because it's trying to solve a different problem than the one you have.

I am assuming that you can't necessarily predict how many duplicate Record ID values you have per Alias, and thus don't know how many columns you'll need.

If you have only two, then displaying each of them in a column becomes a relatively trivial exercise. If you have more, I'd urge you to consider whether the destination for these records (a report? A web page? Excel?) might be able to do a better job of displaying them horizontally than SQL Server can do in returning them arranged horizontally.

edited Jul 11, 2022 at 13:33

answered Jul 8, 2013 at 22:39

Ann L.

14k5 gold badges39 silver badges67 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

user1301877 Over a year ago

If A is defined with Select Distinct, how would B contain anything? It is pulling from A with Count > 1 but everything in A is distinct...

Ann L. Over a year ago

A has Alias and [Record ID], so the DISTINCT will make sure that you have distinct combinations of those two. B groups by just Alias, so if you have more than one case in A of the same Alias (but with different [Record ID] values), that Alias will be returned. The final query combines them: show those values from A that appear in B: the combinations of Alias and [Record ID] that belong to Alias values that had more than one [Record ID] associated with them.

robert woods Over a year ago

How could I run an update on the results of this query? Eg. change RecordID=4 where RecordID=2 just in these results.

Ann L. Over a year ago

@robertwoods Sorry it's taken me a LONG time to get back to you. Do you still have a question, or are things taken care of?

robodisco · Accepted Answer · 2022-10-11 08:33:32Z

I agree with Ann L's answer but would like to show how you can use window functions with CTE's as you may prefer the readability.

(Re: how to pivot horizontally, I again agree with Ann)

create temporary table things (
  id serial primary key,
   alias varchar,
   record_id int
)

insert into things (alias, record_id) values 
('000123', 4),
('000123', 4),
('000234', 4),
('000123', 6),
('000345', 6),
('000345', 7);

with 
    things_with_distinct_aliases_and_record_ids as (
        select distinct on (alias, record_id)
               id, 
             alias, 
             record_id 
             from things
    ),
    things_with_unique_record_id_counts_per_alias as (
        select *,
             COUNT(*) OVER(PARTITION BY alias) as unique_record_ids_count
             from things_with_distinct_aliases_and_record_ids
  ) 

select * from things_with_unique_record_id_counts_per_alias
       where unique_record_ids_count > 1

The first CTE gets all the unique alias/record id combinations. E.g.

 id | alias  | record_id
----+--------+-----------
  1 | 000123 |         4
  4 | 000123 |         6
  3 | 000234 |         4
  5 | 000345 |         6
  6 | 000345 |         7

The second CTE simply creates a new column for the above and adds the count of record ids for each alias. This allows you to filter only those aliases which have more than one record id associated with them.

 id | alias  | record_id | unique_record_ids_count
----+--------+-----------+-------------------------
  1 | 000123 |         4 |                       2
  4 | 000123 |         6 |                       2
  3 | 000234 |         4 |                       1
  5 | 000345 |         6 |                       2
  6 | 000345 |         7 |                       2

Curt · Accepted Answer · 2013-07-08 23:32:10Z

0

Perhaps what you want is just the min() and max() of RecordId:

select Alias, min(RecordID), max(RecordId)
from yourTable t
group by Alias
having min(RecordId) <> max(RecordId)

You can also count the number of distinct values, using count(distinct):

select Alias, count(distinct RecordId) as NumRecordIds, min(RecordID), max(RecordId)
from yourTable t
group by Alias
having count(DISTINCT RecordID) > 1;

edited Jul 8, 2013 at 23:32

Curt

5,7511 gold badge25 silver badges36 bronze badges

answered Jul 8, 2013 at 23:00

Gordon Linoff

1.3m62 gold badges706 silver badges857 bronze badges

1 Comment

Curt Over a year ago

Since she's specifically looking for duplicates, I would add to your query: " HAVING Count(distinct RecordId) > 1 "

rmuc8 · Accepted Answer · 2015-06-16 11:04:20Z

0

This will give all repeated values:

select Alias, count(RecordId) as NumRecordIds,  
from yourTable t
group by Alias
having count(RecordId) <> count(distinct RecordId);

edited Jun 16, 2015 at 11:04

rmuc8

2,9977 gold badges29 silver badges38 bronze badges

answered Jun 16, 2015 at 10:13

user5014915

11 bronze badge

Comments

Esset · Accepted Answer · 2022-10-14 18:28:05Z

-2

SELECT A.CitationId,B.CitationId, A.CitationName, A.LoaderID, A.PrimaryReferenceLoaderID,B.SecondaryReference1LoaderID, A.SecondaryReference1LoaderID, A.SecondaryReference2LoaderID, 
 A.SecondaryReference3LoaderID, A.SecondaryReference4LoaderID, A.CreatedOn, A.LastUpdatedOn 
FROM CitationMaster A, CitationMaster B
WHERE A.PrimaryReferenceLoaderID= B.SecondaryReference1LoaderID and Isnull(A.PrimaryReferenceLoaderID,'') != '' and Isnull(B.SecondaryReference1LoaderID,'') !=''

edited Oct 14, 2022 at 18:28

Esset

1,0562 gold badges16 silver badges18 bronze badges

answered Oct 11, 2022 at 6:46

Pushpraj Jagadale

1

Collectives™ on Stack Overflow

In SQL, find duplicates in one column with unique values for another column

5 Answers 5

4 Comments

Comments

1 Comment

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

4 Comments

Comments

1 Comment

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related