3

I have a table People (First_Name, Last_Name). This table has records that are duplicated as in the example (not all rows are duplicated):

First_Name  Last_Name
John        Smith
Alec        Baldwin
Smith       John
Angelo      Gordon
Mary        Bush
Bush        Mary

How do I select all distinct people? In the final output of the query John Smith should appear only once (it’s not import if in the final query there is John Smith or Smith John).

Thank you.

0

6 Answers 6

5

Just pick an ordering and apply it across everyone. Then use a union that will eliminate duplicates anyway

select FirstName,LastName from People where FirstName <= LastName
union
select LastName,FirstName from People where LastName < FirstName
Sign up to request clarification or add additional context in comments.

4 Comments

Nice one. Would you use this version in RL over the answer I posted? (And why?)
@Tobias - Well, it was the first answer I thought of, and I didn't see anyone else posting similar. I'd note that yours does depend on a special character (" ") not appearing within these columns, which may or may not be relevant. Depending on the size of the table, and the intelligence of the optimizer, this may lead to two scans of the table, but I think yours has the same problems going for it - that subquery is going to scan the whole table I think.
Clean idea. Thumb up! Maybe it can have performance problems but I think other solutions would have bigger troubles with that, too.
Thanks for both the answer and the clarification comment.
3

This is one way to do it using pretty much any SQL flavor.

DECLARE @Names TABLE (
  First_Name VARCHAR(32)
  , Last_Name VARCHAR(32)
)  

INSERT INTO @Names VALUES ('John', 'Smith')
INSERT INTO @Names VALUES ('Alec', 'Baldwin')
INSERT INTO @Names VALUES ('Smith', 'John')
INSERT INTO @Names VALUES ('Angelo', 'Gordon')
INSERT INTO @Names VALUES ('Mary', 'Bush')
INSERT INTO @Names VALUES ('Bush', 'Mary')

Using a JOIN

SELECT  n1.*
FROM    @Names n1
        LEFT OUTER JOIN @Names n2 ON n2.First_Name = n1.Last_Name
                                     AND n2.Last_Name = n1.First_Name
                                     AND n2.First_Name < n1.First_Name
WHERE   n2.First_Name IS NULL                                     

or NOT EXISTS

SELECT  n1.*
FROM    @Names n1
WHERE   NOT EXISTS (
          SELECT  *  
          FROM    @Names n2 
          WHERE   n2.First_Name = n1.Last_Name
                  AND n2.Last_Name = n1.First_Name
                  AND n2.First_Name < n1.First_Name
        )

Comments

1

Sorry was missundertanding your question on the first try...

WITH People (Firstname, Lastname)
AS
(
    SELECT 'John' AS Firstname, 'Smith' AS Lastname UNION
    SELECT 'John' AS Firstname, 'Smith' AS Lastname UNION
    SELECT 'Alec' AS Firstname, 'Baldwin' AS Lastname UNION
    SELECT 'Smith' AS Firstname, 'John' AS Lastname UNION
    SELECT 'John' AS Firstname, 'Smith' AS Lastname UNION
    SELECT 'Angelo' AS Firstname, 'Gordon' AS Lastname UNION
    SELECT 'Mary' AS Firstname, 'Bush' AS Lastname UNION
    SELECT 'Bush' AS Firstname, 'Mary' AS Lastname
)
SELECT p1.* FROM People p1 
LEFT OUTER JOIN People p2 ON p2.Firstname = p1.Lastname AND p2.Lastname = p1.Firstname AND p2.Firstname < p1.Firstname 
WHERE p2.Firstname IS NULL  

3 Comments

Are you ignoring that 'Mary Bush' should be grouped with 'Bush Mary'?
No. the first_name and last_name attributes need to louse their logic. In the final answer there must be a John Smith record for both John Smith and Smith John initial records.
Sorry my fault. Missunderstood the question completely.
0

Here's a solution which uses Oracle functions. Other flavours of SQL will have the same or very similar functions:

SQL> select * from t23
  2  /

FIRST_NAME                     LAST_NAME
------------------------------ ------------------------------
John                           Smith
Alec                           Baldwin
Smith                          John
Angelo                         Gordon
Mary                           Bush
Bush                           Mary

6 rows selected.

SQL> select distinct least(first_name, last_name)
  2                  , greatest(first_name, last_name)
  3  from t23
  4  /

LEAST(FIRST_NAME,LAST_NAME)    GREATEST(FIRST_NAME,LAST_NAME)
------------------------------ ------------------------------
Alec                           Baldwin
Bush                           Mary
John                           Smith
Angelo                         Gordon

SQL> 

Comments

0

I think this might work in MS-SQL

select * from People

where (FirstName + "," + LastName) <> (LastName + "," + FirstName)

Comments

0

Another sugestion

Temporary Table:

DECLARE @Names TABLE (
  First_Name VARCHAR(32)
  , Last_Name VARCHAR(32)
)  

INSERT INTO @Names VALUES ('John', 'Smith')
INSERT INTO @Names VALUES ('Alec', 'Baldwin')
INSERT INTO @Names VALUES ('Smith', 'John')
INSERT INTO @Names VALUES ('Angelo', 'Gordon')
INSERT INTO @Names VALUES ('Mary', 'Bush')
INSERT INTO @Names VALUES ('Bush', 'Mary')

Using CASE

SELECT DISTINCT 
    CASE WHEN First_Name <= Last_Name THEN First_Name ELSE Last_Name END AS First_Name,
    CASE WHEN First_Name <= Last_Name THEN Last_Name ELSE First_Name END AS Last_Name
FROM @Names

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.