Removing duplicate combinations from result set in SQL Server

Question

I have a table with two columns with data like this:

1,2
1,3
1,4
2,1
2,2
3,1

I want to select just unique combinations, so out of those I would end up with:

1,2
1,3
1,4
2,2

because 1,2 is the same combination as 2,1 etc

How would I go about that in a SQL statement?

In reality, my table has a third column and I want to add a where clause based on that third column so that only those rows are considered

@Mark I dont mean to bash anymore, but he was right, your query did not go close on solving the OP's problem, i believe it's more important to try your solutions to see if at least you can get the same resultset as the OP asked for — Andrei Dvoynos
– Andrei Dvoynos, Commented Oct 19, 2012 at 15:03

AakashM · Accepted Answer · 2012-10-19 14:50:23Z

10

SELECT * FROM (
    SELECT
        CASE WHEN Col1 <= Col2 THEN Col1 ELSE Col2 END AS Col1,
        CASE WHEN Col1 <= Col2 THEN Col2 ELSE Col1 END AS Col2
    FROM
        MyTable
) Ordered
GROUP BY
    Col1, Col2

You could do it without the subquery by GROUPing on the CASE expressions, but it's longer to read.

answered Oct 19, 2012 at 14:50

AakashM

63.5k17 gold badges154 silver badges190 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

Graham Over a year ago

This works for the simple case I have put in my question, but when I add a where clause is doesn't - where would I put the where clause(s)

Aaron Bertrand Over a year ago

@Graham if you want a solution that works for your scenario, please post your actual scenario in the question. Dumbing it down clearly doesn't help anyone.

AakashM Over a year ago

"It doesn't work" doesn't explain the problem enough. You need to elaborate on your input, expected and actual outcomes.

Graham Over a year ago

fair point, I have edited my original question. I didn't think it would make any difference, but it clearly does

AakashM Over a year ago

With your condition, you should filter in the innermost SELECT.

|

ypercubeᵀᴹ · Accepted Answer · 2012-10-19 15:33:25Z

7

Another way to achieve the same thing:

SELECT a, b
FROM tableX
WHERE a <= b
  AND (other conditions)

UNION 

SELECT b, a
FROM tableX 
WHERE a > b 
  AND (other conditions) ;

This variation may be different (regarding efficiency), depending on the indexes you have:

SELECT *
FROM
  ( SELECT a, b
    FROM tableX
    WHERE (other conditions)
  UNION 
    SELECT b, a
    FROM tableX 
    WHERE (other conditions)
  ) AS tmp
WHERE a <= b ;

edited Oct 19, 2012 at 15:33

answered Oct 19, 2012 at 15:10

ypercubeᵀᴹ

116k19 gold badges181 silver badges249 bronze badges

7 Comments

Graham Over a year ago

This works, but without the distinct on SQLServer 2008. The execution plan shows AakashM's answer as faster (45% over 55%)

Graham Over a year ago

On my real data (with the where clause) AakashM gets 32% over 68%, but I have to admit I understand yours :-)

Aaron Bertrand Over a year ago

@Graham much more meaningful to measure duration, cpu, I/O etc. rather than the estimated cost percentage in the plan. I've seen plans with lower estimated cost take 10x as long to run - even when the estimated cost has come from an actual plan. Have you looked at SQL Sentry Plan Explorer?

Graham Over a year ago

The second method is the same efficiency as the first (on the real data)

Aaron Bertrand Over a year ago

@Graham there is a free version, did you look closely? There are two versions of Plan Explorer: FREE and PRO. The pro version has some additional features but they're not needed for basic performance comparisons.

|

Allan Ramírez · Accepted Answer · 2012-10-19 16:10:01Z

-1

You can try something like:

select distinct col1, col2 from table
where col2 + '-' + col1 not in (select col1 + '-' + col2 from your_table)

Notice that you have to concatenate the fields and it depends of the column type (col1 + '-' + col2 works well with char and varchar types)

edited Oct 19, 2012 at 16:10

answered Oct 19, 2012 at 14:49

Allan Ramírez

2,7092 gold badges18 silver badges15 bronze badges

7 Comments

Graham Over a year ago

This, if it works, is taking far to long to run - still waiting for results

Lamak Over a year ago

This can fail for some results, since it will see the row 23,1 as the same that 2,31

Aaron Bertrand Over a year ago

Now it will fail for "5-0,x" and "5,0-x". SQL Server has ways to determine unique without relying on concatenation, which requires a whole bunch of assumptions about the data in order to be trusted. Note that the column is clearly not an integer otherwise it would fail with conversion errors...

Allan Ramírez Over a year ago

obviously you have to determine the separator character(s) according to your stored data, and make the conversion depending of the data type of your columns in order to concatenate the column values

Allan Ramírez Over a year ago

e.g. in Oracle you can use a syntax like where col2, col1 not in (select col1, col2 from your_table) but you can't do that in SQL Server

|

Mark Kram · Accepted Answer · 2012-10-19 14:43:33Z

-9

How about:

SELECT
      COL1, COL2, COUNT(*)
FROM
     Your_Table
GROUP BY
      COL1, COL2

answered Oct 19, 2012 at 14:43

Mark Kram

5,8527 gold badges56 silver badges73 bronze badges

5 Comments

Lamak Over a year ago

This isn't what op wants, he needs that the combinations of Col1-Col2 or Col2-Col1 are unique

Andrei Dvoynos Over a year ago

How is this even a solution? What do you achieve with the count?

Mark Kram Over a year ago

How does my solution NOT anwser the OP's question: sqlfiddle.com/#!3/d8f44/1

Lamak Over a year ago

Well, your result has nothing to do with the result set asked for. So, it doesn't answer his question at all

Zane Over a year ago

Hey mark I checked your SQL fiddle and it has results (1,2) and (2,1) which is exactly what OP said he was looking to avoid. Also why would @Lamak post another answer when AakashM already has the correct answer on the board. Calm down fella. Just because your particular solution will not work is no reason to get defensive.

Collectives™ on Stack Overflow

Removing duplicate combinations from result set in SQL Server

4 Answers 4

7 Comments

7 Comments

7 Comments

5 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

7 Comments

7 Comments

7 Comments

5 Comments

Your Answer

Sign up or log in

Post as a guest

Related