Find duplicates across different columns with same information

Question

I've have tried googling this, but with no luck. All the answers I could find was with different information across the columns. In this case I have the same information (phone number) in multiple columns.

I have a table with customers and their phone numbers. Each customer have up to three different phone numbers in three different columns (home, work and mobile phone). A mobile number can be used as a home number. Some customers have the same phone number as other customers because they are husband and wife or parent and child. This is not okay in this case.

Therefore I need to search for dublicates in both the mobile phone and the home phone column. (The work phone number is ignored here.)

Example table:

ID / Phone home / Phone mobile
1 / 12345678 / 98765432
2 / 12345678 / 22222222
3 / 23456789 / 33333333
4 / 33333333 / 87654321

In the above table I would need the query to return all four customers. ID 1 and ID 2 because they have the same home phone number and ID 3 and ID 4 because they have the same number (33333333) in the home and mobile column.

I need the customer ID's of the duplicates at least, but I would like to see the duplicate phone number as well.

Thank you!

Tim · Accepted Answer · 2017-08-14 12:06:51Z

1

My approach would be to normalize the query into just Id and Phone, then do a self join to find the duplicate:

WITH [CTE] AS (
    SELECT Id, HomePhone as Phone FROM customers
    UNION 
    SELECT Id, WorkPhone as Phone FROM customers
    UNION
    SELECT Id, MobilePhone as Phone FROM customers
) 
SELECT c1.Id, c2.Id, [c1].[Phone] 
FROM [CTE] AS [c1] INNER JOIN [CTE] AS [c2] 
   ON [c1].[Phone] = [c2].[Phone] AND [c1].[Id] <> [c2].[Id]

answered Aug 14, 2017 at 12:06

Tim

6,0681 gold badge16 silver badges19 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Mette Over a year ago

Thank you. This ended up solving my problem. I did try something like this before, but I didn't get the inner join part right on my own.

Gordon Linoff · Accepted Answer · 2017-08-14 12:00:19Z

1

I think you can do what you want using exists:

select t.*
from t
where exists (select 1
              from t t2
              where t2.id <> t.id and
                    (t2.homephone in (t.homephone, mobilephone) or
                     t2.mobilephone in (t.homephone, mobilephone)
                    )
             );

Another approach might have better performance:

select tp.*
from (select t.id, v.phone,
             count(*) over (partition by id) as cnt
      from t cross apply
           (values (t.homephone), (t.mobilephone)) v(phone)
     ) tp
where cnt > 1;

This version will actually return rows that have the same phone in two columns. This is easily fixed, if that could be an issue.

answered Aug 14, 2017 at 12:00

Gordon Linoff

1.3m62 gold badges706 silver badges857 bronze badges

1 Comment

Mette Over a year ago

Thank you for pointing me towards EXISTS. I didn't know that. However, there is a lot of duplicates, meaning the query just gives me all the rows from the customer table anyway. I didn't try the cross apply solution, because I succeeded with one of the other answers.

James Casey · Accepted Answer · 2017-08-14 12:17:43Z

1

You could unpivot the data then do a self join:

CREATE TABLE #phone
(
    Id INT,
    HomePhone VARCHAR(20),
    MobilePhone VARCHAR(20)
)

INSERT INTO #phone ( Id, HomePhone, MobilePhone ) VALUES(1, 12345678, 98765432);
INSERT INTO #phone ( Id, HomePhone, MobilePhone ) VALUES(2, 12345678, 22222222);
INSERT INTO #phone ( Id, HomePhone, MobilePhone ) VALUES(3, 23456789, 33333333);
INSERT INTO #phone ( Id, HomePhone, MobilePhone ) VALUES(4, 33333333, 87654321);

WITH unpivoted AS 
(
    SELECT Id, Phone, PhoneType
    FROM #phone
    UNPIVOT
    (
        Phone FOR PhoneType IN (HomePhone, MobilePhone)
    ) AS unpiv
)
SELECT a.id, a.Phone, a.PhoneType, b.Id, b.Phone, b.PhoneType
FROM unpivoted A
INNER JOIN unpivoted B ON A.Phone = B.Phone AND B.Id > A.Id

returns

1   12345678    HomePhone   2   12345678    HomePhone
3   33333333    MobilePhone 4   33333333    HomePhone

answered Aug 14, 2017 at 12:17

James Casey

2,5071 gold badge14 silver badges19 bronze badges

1 Comment

Mette Over a year ago

Thank you for showing me the unpivot solution. I didn't know that before. I had som trouble with the where clauses I needed to add, which is why this did not end up as being my end result.

Rumpelstinsk · Accepted Answer · 2017-08-14 11:58:56Z

0

It would be helpfull if you provide an example of the output data. However try with EXISTS : https://msdn.microsoft.com/es-es/library/ms188336(v=sql.120).aspx

SELECT *
FROM customers c
WHERE EXISTS (SELECT * 
             FROM customers c2 
             WHERE c.Id != c2.Id 
             AND (
                   (c.HomePhone = c2.HomePhone || c.HomePhone = c2.WorkPhone || c.HomePhone = c2.MobilePhone)
                    || (c.WorkPhone = c2.HomePhone || c.WorkPhone = c2.WorkPhone || c.WorkPhone = c2.MobilePhone)
                    || (c.MobilePhone= c2.HomePhone || c.MobilePhone= c2.WorkPhone || c.MobilePhone= c2.MobilePhone)
             )

answered Aug 14, 2017 at 11:58

Rumpelstinsk

3,2514 gold badges37 silver badges60 bronze badges

1 Comment

Mette Over a year ago

Thank you for pointing me towards 'EXISTS'. I didn't know that. However, there is a lot of duplicates, meaning the query just gives me all the rows from the customer table anyway.

Collectives™ on Stack Overflow

Find duplicates across different columns with same information

4 Answers 4

1 Comment

1 Comment

1 Comment

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

1 Comment

1 Comment

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related