9

Give the following df

  Id other  concat
0  A     z       1
1  A     y       2
2  B     x       3
3  B     w       4
4  B     v       5
5  B     u       6

I want the result with new column with grouped values as list

  Id other  concat           new
0  A     z       1        [1, 2]
1  A     y       2        [1, 2]
2  B     x       3  [3, 4, 5, 6]
3  B     w       4  [3, 4, 5, 6]
4  B     v       5  [3, 4, 5, 6]
5  B     u       6  [3, 4, 5, 6]

This is similar to these questions:

grouping rows in list in pandas groupby

Replicating GROUP_CONCAT for pandas.DataFrame

However, it is apply the grouping you get from df.groupby('Id')['concat'].apply(list), which is a Series of smaller size than the dataframe, to the original dataframe.

I have tried the code below, but it does not apply this to the dataframe:

import pandas as pd
df = pd.DataFrame( {'Id':['A','A','B','B','B','C'], 'other':['z','y','x','w','v','u'], 'concat':[1,2,5,5,4,6]})
df.groupby('Id')['concat'].apply(list)

I know that transform can be used to apply groupings to dataframes, but it does not work in this case.

>>> df['new_col'] = df.groupby('Id')['concat'].transform(list)
>>> df
  Id  concat other  new_col
0  A       1     z        1
1  A       2     y        2
2  B       5     x        5
3  B       5     w        5
4  B       4     v        4
5  C       6     u        6
>>> df['new_col'] = df.groupby('Id')['concat'].apply(list)
>>> df
  Id  concat other new_col
0  A       1     z     NaN
1  A       2     y     NaN
2  B       5     x     NaN
3  B       5     w     NaN
4  B       4     v     NaN
5  C       6     u     NaN

3 Answers 3

7

groupby with join

df.join(df.groupby('Id').concat.apply(list).to_frame('new'), on='Id')

enter image description here

Sign up to request clarification or add additional context in comments.

Comments

3

Less elegant (and slower..) solution, but let it be here just as an alternative.

def func(gr):
    gr['new'] = [list(gr.concat)] * len(gr.index)
    return gr
df.groupby('Id').apply(func)

%timeit df.groupby('Id').apply(func)
100 loops, best of 3: 4.18 ms per loop

%timeit df.join(df.groupby('Id').concat.apply(list).to_frame('new'), on='Id')
1000 loops, best of 3: 1.69 ms per loop

Comments

2

Use transform with [x.tolist()] or [x.values]

In [1396]: df.groupby('Id')['concat'].transform(lambda x: [x.tolist()])
Out[1396]:
0          [1, 2]
1          [1, 2]
2    [3, 4, 5, 6]
3    [3, 4, 5, 6]
4    [3, 4, 5, 6]
5    [3, 4, 5, 6]
Name: concat, dtype: object

In [1397]: df['new'] = df.groupby('Id')['concat'].transform(lambda x: [x.tolist()])

In [1398]: df
Out[1398]:
  Id other  concat           new
0  A     z       1        [1, 2]
1  A     y       2        [1, 2]
2  B     x       3  [3, 4, 5, 6]
3  B     w       4  [3, 4, 5, 6]
4  B     v       5  [3, 4, 5, 6]
5  B     u       6  [3, 4, 5, 6]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.