0

I have table like below:

|Group|User|
|  1  | X  | 
|  1  | Y  |
|  1  | Z  |
|  2  | X  |
|  2  | Y  |
|  2  | Z  |
|  3  | X  |
|  3  | Z  |
|  4  | X  |

I want to calculate intersections of groups: 1&2, 1&2&3, 1&2&3&4. Then I want to see users in each intersection and how many of them are there. In the given example should be:

1&2     -> 3
1&2&3   -> 2
1&2&3&4 -> 1

Is it possible using SQL? If yes, how can I start?

On a daily basis I'm working with Python, but now I have to translate this code into SQL code and I have huge problem with it.

4
  • What's the database? I would assume PostgreSQL if you don't say. Commented May 21, 2022 at 19:28
  • It's in Clickhouse. But it'll be in BigQuery soon. Commented May 21, 2022 at 19:35
  • I think if I handle it in any DB, I'll do it in another. Commented May 21, 2022 at 19:37
  • 1
    It would be easy with a recursive CTE, I believe neither Clickhouse nor BigQuery supports it Commented May 21, 2022 at 20:37

1 Answer 1

1

In PostgreSQL you can use JOIN or INTERSECT. Example with JOIN:

select count(*)
from t a
join t b on b.usr = a.usr
where a.grp = 1 and b.grp = 2

Then:

select count(*)
from t a
join t b on b.usr = a.usr
join t c on c.usr = a.usr
where a.grp = 1 and b.grp = 2 and c.grp = 3

And:

select count(*)
from t a
join t b on b.usr = a.usr
join t c on c.usr = a.usr
join t d on d.usr = a.usr
where a.grp = 1 and b.grp = 2 and c.grp = 3 and d.grp = 4

Result in 3, 2, and 1 respectively.

EDIT - You can also do:

select count(*)
from (
  select usr, count(*) 
  from t
  where grp in (1, 2, 3, 4)
  group by usr
  having count(*) = 4 -- number of groups
) x

See running example at db<>fiddle.

Sign up to request clarification or add additional context in comments.

5 Comments

What If I have more groups? Like 15-20. Can be there any other way to solve it?
Great! It looks good! Can I add loop or sth, so I don't need to write this 15-20 times? Cuz I also have to modify tuple with groups: where grp in (1, 2, 3) and It'll be frustrating.
@Joe , you can create a parameterized query passing it the list of groups in a form of array or csv string. If you have a problem creating a query like that you'd better ask another question.
It's something wrong with this approach. I made it twice in 2 different ways: - length(arrayIntersect(groupUniqArray, )) in CH and - using pandas in Python. Both ways had same results, but different than yours.
Ok. I got it. I had to drop duplicates or change the condition from having count() = 4 to having count() >= 4.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.