SQL Server: Find duplicate substrings in one column

Question

I have a table of clients in SQL Server. I'm trying to find away to find duplicates in the email_address column, but I need to only consider part of the column data, so a substring. In practical terms I need to find duplicate domain names in the records.

I have used the following query to find exact duplicates (on the whole field), but how can I modify this to consider a substring?

SELECT a.email_address, b.dupeCount, a.client_id
FROM tblClient a
INNER JOIN (
    SELECT email_address, COUNT(*) AS dupeCount
    FROM tblClient
    GROUP BY email_address
    HAVING COUNT(*) > 1
) b ON a.email_address = b.email_address

Many thanks!

How about your try something if you already suspect you need to use substring — Mihai
– Mihai, Commented Sep 9, 2014 at 15:05
just a side note, a pivot might be better performing for the data you're aiming to get. — CodeMonkey1313
– CodeMonkey1313, Commented Sep 9, 2014 at 15:07
Try joining on the matching substrings within the email address. — Michael McGriff
– Michael McGriff, Commented Sep 9, 2014 at 15:07

Katherine Elizabeth Lightsey · Accepted Answer · 2014-09-10 19:42:51Z

1

try this:

declare @contact table (
  [client_id] [int] identity(1, 1)
  , [email]   [sysname]
  );
insert into @contact
        ([email])
values      (N'joe@billy_bobs.com'),
        (N'[email protected]'),
        (N'george@billy_bobs.com');
with [stripper]
 as (select [client_id]
            , [email]
            , substring([email]
                        , charindex(N'@', [email], 0) + 1
                        , len([email])) as [domain_name]
     from   @contact),
 [duplicate_finder]
 as (select [client_id]
            , [domain_name]
            , row_number()
                over (
                  partition by [domain_name]
                  order by [domain_name]) as [sequence]
     from   [stripper])
select from [duplicate_finder]
where  [sequence] > 1;

edited Sep 10, 2014 at 19:42

answered Sep 9, 2014 at 15:49

Katherine Elizabeth Lightsey

8535 silver badges12 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Adam92 Over a year ago

Thanks for your response. I'm not actually looking to delete the duplicate records, how can I get this to work without the delete part?

Katherine Elizabeth Lightsey Over a year ago

Adam, I updated to just a select statement to reflect your question.

Milen · Accepted Answer · 2014-09-09 15:23:12Z

0

gee:

SELECT substr(email_address, 1, 2), count(*)
FROM tblClient 
group by 1

edited Sep 9, 2014 at 15:23

Milen

8,8977 gold badges47 silver badges59 bronze badges

answered Sep 9, 2014 at 15:16

qqq

1

1 Comment

zelusp Over a year ago

How would you modify this query to get all the rows associated with those unique substrings?

Collectives™ on Stack Overflow

SQL Server: Find duplicate substrings in one column

2 Answers 2

2 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related