3

I have the below dataframe over which I am trying to group by and aggregate data.

Column_1 Column_2 Column_3
A        N1       P1
A        N2       P2
A        N3       P3
B        N1       P1
C        N1       P1
C        N2       P2

Required Output:

Column_1 Column_2 Column_3
A        N1,N2,N3 P1,P2,P3
B        N1       P1
C        N1,N2    P1,P2

I am able to do it over one column by creating a window using partition and groupby. Then I use collect list and group by over the window and aggregate to get a column. THis works for one column.

How to perform the same over 2 columns. Kindly help

1 Answer 1

11

The agg function of the group by can take more than one aggreation function. You can add collect_list twice:

df.groupby('Column_1').agg(F.collect_list('Column_2'), F.collect_list('Column_3')).orderBy('Column_1').show()

prints

+--------+----------------------+----------------------+
|Column_1|collect_list(Column_2)|collect_list(Column_3)|
+--------+----------------------+----------------------+
|       A|          [N1, N2, N3]|          [P1, P2, P3]|
|       B|                  [N1]|                  [P1]|
|       C|              [N1, N2]|              [P1, P2]|
+--------+----------------------+----------------------+

For a simple grouping there is no need to use a Window.

Sign up to request clarification or add additional context in comments.

2 Comments

WOrking as expected. Is there a link or an article which clearly states on what scenarios we have to use window? and where we can use groupby like above. Would be helpful to learn. Thank you.
Maybe this link is helpful. As a rule of thumb I would use windows when I expect the numbers of rows after the operation to stay the same and I would use a groupBy if I expect the number of rows in the result to be lower than in the original dataset.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.