4

I want to make a smart count operation, so that if the data in columns are the same, then it will be counted as 1.

My table is:

dbo.Messages 
(
    FromUserId INT,
    ToUserId INT
)

Data:

INSERT dbo.Messages VALUES(1, 5), (2, 20), (5, 1), (1, 5);

The count should return 2 because (1,5) and (5,1) is the same in my algorithm.

How can I write it in SQL Server TSQL?

Thanks in advance.

4 Answers 4

2

This works quite well:

CREATE TABLE #Messages 
(
    FromUserId INT,
    ToUserId INT
);

INSERT #Messages VALUES(1, 5), (2, 20), (5, 1), (1, 5);

SELECT COUNT(*)
FROM (
  SELECT M1.FromUserId, M1.ToUserId
  FROM #Messages AS M1
  EXCEPT
  SELECT M2.ToUserId, M2.FromUserId
  FROM #Messages AS M2
  WHERE M2.ToUserId > M2.FromUserId
  ) AS T;

Derived table with EXCEPT will remove your duplicates and then it just counts so called unique values. Keep in mind that here's no need for DISTINCT keyword, EXCEPT removes all dupes.

Results from derived table:

FromUserId ToUserId 
---------- -------- 
1          5        
2          20   

You can check how this query works here: https://data.stackexchange.com/stackoverflow/query/524634/counting-unique-values

Sign up to request clarification or add additional context in comments.

Comments

2

One way to go about this is to group by the least and greatest of the FromUserId and ToUserId, using the distinct values from your original table. Since SQL Server, unlike MySQL, does not have a LEAST and GREATEST function, we can use CASE expressions instead.

SELECT CASE WHEN t.FromUserId < t.ToUserId THEN t.FromUserId ELSE t.ToUserId END,
       CASE WHEN t.FromUserId < t.ToUserId THEN t.ToUserId   ELSE t.FromUserId END,
       COUNT(*) AS duplicateCount
FROM
(
    SELECT DISTINCT FromUserId, ToUserId
    FROM dbo.Messages
) t
GROUP BY CASE WHEN t.FromUserId < t.ToUserId THEN t.FromUserId ELSE t.ToUserId END,
         CASE WHEN t.FromUserId < t.ToUserId THEN t.ToUserId   ELSE t.FromUserId END

Comments

1

On SQL Server 2008 and later this should work:

SELECT distinct
    (SELECT Min(v) FROM (VALUES (FromUserId), (ToUserId)) AS value(v)) as UserIdMin,
    (SELECT Max(v) FROM (VALUES (FromUserId), (ToUserId)) AS value(v)) as UserIdMax
FROM dbo.Messages

Cred to: SQL MAX of multiple columns?

1 Comment

This is great answer. To add something up, you don't need two seperate subqueries and could use CROSS APPLY to return both MIN and MAX values. Here's an example: data.stackexchange.com/stackoverflow/query/524750/…
0

Demo here

select distinct  t1.*
from
#temp t1
join
#temp t2
on t1.FromUserId=t2.ToUserId
and t1.ToUserId=t2.FromUserId

4 Comments

I don't think it meets A/C. If you would remove count and keep just distinct, it brings back 1 and 5 and that seems wrong. 1, 5 and 5, 1 should be treated as same.
@EvaldasBuinauskas:User is asking for count and in this case it returns2,did you saw the demo
I did. But it just happens that it brings 2. Edit data and your query doesn't work anymore. It will bring back incorrect count.
@EvaldasBuinauskas:thanks ,i was thinking of this line from users question whole time "count should return 2" ,updated now

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.