0

I have a table of clients in SQL Server. I'm trying to find away to find duplicates in the email_address column, but I need to only consider part of the column data, so a substring. In practical terms I need to find duplicate domain names in the records.

I have used the following query to find exact duplicates (on the whole field), but how can I modify this to consider a substring?

SELECT a.email_address, b.dupeCount, a.client_id
FROM tblClient a
INNER JOIN (
    SELECT email_address, COUNT(*) AS dupeCount
    FROM tblClient
    GROUP BY email_address
    HAVING COUNT(*) > 1
) b ON a.email_address = b.email_address

Many thanks!

3
  • How about your try something if you already suspect you need to use substring Commented Sep 9, 2014 at 15:05
  • just a side note, a pivot might be better performing for the data you're aiming to get. Commented Sep 9, 2014 at 15:07
  • Try joining on the matching substrings within the email address. Commented Sep 9, 2014 at 15:07

2 Answers 2

1

try this:

declare @contact table (
  [client_id] [int] identity(1, 1)
  , [email]   [sysname]
  );
insert into @contact
        ([email])
values      (N'joe@billy_bobs.com'),
        (N'[email protected]'),
        (N'george@billy_bobs.com');
with [stripper]
 as (select [client_id]
            , [email]
            , substring([email]
                        , charindex(N'@', [email], 0) + 1
                        , len([email])) as [domain_name]
     from   @contact),
 [duplicate_finder]
 as (select [client_id]
            , [domain_name]
            , row_number()
                over (
                  partition by [domain_name]
                  order by [domain_name]) as [sequence]
     from   [stripper])
select from [duplicate_finder]
where  [sequence] > 1;
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks for your response. I'm not actually looking to delete the duplicate records, how can I get this to work without the delete part?
Adam, I updated to just a select statement to reflect your question.
0

gee:

SELECT substr(email_address, 1, 2), count(*)
FROM tblClient 
group by 1

1 Comment

How would you modify this query to get all the rows associated with those unique substrings?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.