Pandas apply function by group returning multiple new columns

Question

I am trying to apply a function to a column by group with the objective of creating 2 new columns, containing the returned values of the function for each group. Example as follows:

def testms(x):
    mu = np.sum(x)
    si = np.sum(x)/2
    return mu, si

df = pd.concat([pd.DataFrame({'A' : [1, 1, 1, 1, 1, 2, 2, 2, 2, 2]}), pd.DataFrame({'B' : np.random.rand(10)})],axis=1)
df

   A      B
0  1  0.696761
1  1  0.035178
2  1  0.468180
3  1  0.157818
4  1  0.281470
5  2  0.377689
6  2  0.336046
7  2  0.005879
8  2  0.747436
9  2  0.772405

desired_result = 

   A      B         mu        si
0  1  0.696761   1.652595   0.826297
1  1  0.035178   1.652595   0.826297
2  1  0.468180   1.652595   0.826297
3  1  0.157818   1.652595   0.826297
4  1  0.281470   1.652595   0.826297
5  2  0.377689   2.997657   1.498829
6  2  0.336046   2.997657   1.498829
7  2  0.005879   2.997657   1.498829
8  2  0.747436   2.997657   1.498829
9  2  0.772405   2.997657   1.498829

I think I have found a solution but I am looking for something a bit more elegant and efficient:

x = df.groupby('A')['B'].apply(lambda x: pd.Series(testms(x),index=['mu','si']))

A       
1  mu    1.652595
   si    0.826297
2  mu    2.997657
   si    1.498829
Name: B, dtype: float64

df.merge(x.drop(labels='mu',level=1),on='A',how='outer').merge(x.drop(labels='si',level=1),on='A',how='outer')

jezrael · Accepted Answer · 2019-11-05 13:06:35Z

One idea is change function for create new columns filled by mu and si values and return x for return group:

def testms(x):
    mu = np.sum(x['B'])
    si = np.sum(x['B'])/2
    x['mu'] = mu
    x['si'] = si
    return x

x = df.groupby('A').apply(testms)
print (x)
   A         B        mu        si
0  1  0.352297  3.590048  1.795024
1  1  0.860488  3.590048  1.795024
2  1  0.939260  3.590048  1.795024
3  1  0.988280  3.590048  1.795024
4  1  0.449723  3.590048  1.795024
5  2  0.125852  1.300524  0.650262
6  2  0.853474  1.300524  0.650262
7  2  0.000996  1.300524  0.650262
8  2  0.223886  1.300524  0.650262
9  2  0.096316  1.300524  0.650262

Your solution should be simplify with Series.unstack and DataFrame.join:

df1 = df.groupby('A')['B'].apply(lambda x: pd.Series(testms(x),index=['mu','si'])).unstack()
x = df.join(df1, on='A')
print (x)
   A         B        mu        si
0  1  0.085961  2.791346  1.395673
1  1  0.887589  2.791346  1.395673
2  1  0.685952  2.791346  1.395673
3  1  0.946613  2.791346  1.395673
4  1  0.185231  2.791346  1.395673
5  2  0.994415  3.173444  1.586722
6  2  0.159852  3.173444  1.586722
7  2  0.773711  3.173444  1.586722
8  2  0.867337  3.173444  1.586722
9  2  0.378128  3.173444  1.586722

Collectives™ on Stack Overflow

Pandas apply function by group returning multiple new columns

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related