3

Given data:

grp data1 data2 data3
a 2 1 2
a 4 6 3
b 3 2 1
b 7 3 5

Expected output:

grp sum(data1) sum(data2)/sum(data1) sum(data3)/sum(data1)
a 6 1.166666667 0.83
a 10 0.5 0.6

Assume custom aggregation can be dependent on multiple columns and not always a simple division operation. I know using SQL query it's possible, but I am interested in an answer with apply and aggregate function if possible.

2 Answers 2

2

You can use groupby + assign here to generate required aggregations. You can apply whatever aggregate function is needed.

g = df.groupby('grp')
#                                         for custom agg func use .agg(custom_agg_func)
#                                                          ^^^^^
g[['data1']].agg('sum').assign(sum2 = lambda df: g['data2'].sum()/df['data1'],
                               sum3 = lambda df: g['data3'].sum()/df['data1'])
#                ^^^^^^
#    you can use custom agg func of your choice

     data1      sum2      sum3
grp                           
a        6  1.166667  0.833333
b       10  0.500000  0.600000
Sign up to request clarification or add additional context in comments.

Comments

2

You could do:

df2 = df.groupby('grp').sum()
df2.update(df2.drop(columns='data1').div(df2['data1'], axis=0))

or more classically:

df2 = df.groupby('grp').sum()
df2[['data2', 'data3']] = df2[['data2', 'data3']].div(df2['data1'], axis=0)

output:

     data1     data2     data3
grp                           
a        6  1.166667  0.833333
b       10  0.500000  0.600000
multiple aggregation

You can use the same logic with a MultiIndex:

df2 = df.groupby('grp').agg(['sum', 'mean'])
df2.update(df2.drop(columns='data1').stack(0).div(df2['data1'], axis=0).unstack())

output:

    data1      data2      data3     
      sum mean   sum mean   sum mean
grp                                 
a       6  3.0     7  3.5     5  2.5
b      10  5.0     5  2.5     6  3.0

3 Comments

what if I want other aggregation also like average? here you used the sum for all columns, but that is not usually what I have to do. Eg: sum(data1), avg(data2), min(data2), sum(data2)/sum(data1) etc
please check the above comment. @mozway
@Parshantgarg did it work for you?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.