3

I am trying to apply a function to a column by group with the objective of creating 2 new columns, containing the returned values of the function for each group. Example as follows:

def testms(x):
    mu = np.sum(x)
    si = np.sum(x)/2
    return mu, si

df = pd.concat([pd.DataFrame({'A' : [1, 1, 1, 1, 1, 2, 2, 2, 2, 2]}), pd.DataFrame({'B' : np.random.rand(10)})],axis=1)
df

   A      B
0  1  0.696761
1  1  0.035178
2  1  0.468180
3  1  0.157818
4  1  0.281470
5  2  0.377689
6  2  0.336046
7  2  0.005879
8  2  0.747436
9  2  0.772405

desired_result = 

   A      B         mu        si
0  1  0.696761   1.652595   0.826297
1  1  0.035178   1.652595   0.826297
2  1  0.468180   1.652595   0.826297
3  1  0.157818   1.652595   0.826297
4  1  0.281470   1.652595   0.826297
5  2  0.377689   2.997657   1.498829
6  2  0.336046   2.997657   1.498829
7  2  0.005879   2.997657   1.498829
8  2  0.747436   2.997657   1.498829
9  2  0.772405   2.997657   1.498829

I think I have found a solution but I am looking for something a bit more elegant and efficient:

x = df.groupby('A')['B'].apply(lambda x: pd.Series(testms(x),index=['mu','si']))

A       
1  mu    1.652595
   si    0.826297
2  mu    2.997657
   si    1.498829
Name: B, dtype: float64

df.merge(x.drop(labels='mu',level=1),on='A',how='outer').merge(x.drop(labels='si',level=1),on='A',how='outer')

1 Answer 1

4

One idea is change function for create new columns filled by mu and si values and return x for return group:

def testms(x):
    mu = np.sum(x['B'])
    si = np.sum(x['B'])/2
    x['mu'] = mu
    x['si'] = si
    return x

x = df.groupby('A').apply(testms)
print (x)
   A         B        mu        si
0  1  0.352297  3.590048  1.795024
1  1  0.860488  3.590048  1.795024
2  1  0.939260  3.590048  1.795024
3  1  0.988280  3.590048  1.795024
4  1  0.449723  3.590048  1.795024
5  2  0.125852  1.300524  0.650262
6  2  0.853474  1.300524  0.650262
7  2  0.000996  1.300524  0.650262
8  2  0.223886  1.300524  0.650262
9  2  0.096316  1.300524  0.650262

Your solution should be simplify with Series.unstack and DataFrame.join:

df1 = df.groupby('A')['B'].apply(lambda x: pd.Series(testms(x),index=['mu','si'])).unstack()
x = df.join(df1, on='A')
print (x)
   A         B        mu        si
0  1  0.085961  2.791346  1.395673
1  1  0.887589  2.791346  1.395673
2  1  0.685952  2.791346  1.395673
3  1  0.946613  2.791346  1.395673
4  1  0.185231  2.791346  1.395673
5  2  0.994415  3.173444  1.586722
6  2  0.159852  3.173444  1.586722
7  2  0.773711  3.173444  1.586722
8  2  0.867337  3.173444  1.586722
9  2  0.378128  3.173444  1.586722
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.