1

I want to do something like GROUP BY / GROUP_CONCAT in MySQL using pandas. Let's say I have:

table_a

col_a col_b
A     1
B     2
C     2

table_b

col_a col_c
A     VALUE_1
A     VALUE_2
B     VALUE_3
C     VALUE_4

I want a new table_c as follow:

col_a col_b col_c
A     1      VALUE_1, VALUE_2
B     2      VALUE_3
C     2      VALUE_4

I've been using pd.merge but I cannot find a way to do the concatenation and avoid duplicates.

2 Answers 2

5

Or using agg after merge

df1.merge(df2).groupby('col_a',as_index=False).agg({'col_b':'first','col_c':','.join})
Out[46]: 
  col_a  col_b            col_c
0     A      1  VALUE_1,VALUE_2
1     B      2          VALUE_3
2     C      2          VALUE_4
Sign up to request clarification or add additional context in comments.

1 Comment

I'll select this just because I like the fact that I can use 'first'
5

groupby before merge, ensuring 'col_a' is unique in the right Frame:

df1.merge(df2.groupby('col_a').col_c.apply(', '.join).reset_index())

  col_a  col_b             col_c
0     A      1  VALUE_1, VALUE_2
1     B      2           VALUE_3
2     C      2           VALUE_4

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.