pandas apply User defined function to grouped dataframe on multiple columns

Question

I would like to apply a function f1 by group to a dataframe:

import pandas as pd 
import numpy as np
data = np.array([['id1','id2','u','v0','v1'],
                ['A','A',10,1,7],
                ['A','A',10,2,8],
                ['A','B',20,3,9],
                ['B','A',10,4,10],
                ['B','B',30,5,11],
                ['B','B',30,6,12]])
z = pd.DataFrame(data = data[1:,:], columns=data[0,:])

def f1(u,v):
    return u*np.cumprod(v)

The result of the function depends on the column u and columns v0 or v1 (that can be thousands of v ecause I'm doing a simulation on a lot of paths).

The result should be like this

    id1 id2 new_v0  new_v1
0   A   A   10  70
1   A   A   20  560
2   A   B   60  180
3   B   A   40  100
4   B   B   150 330
5   B   B   900 3960

I tried for a start output = z.groupby(['id1', 'id2']).apply(lambda x: f1(u = x.u,v =x.v0)) but I can't even get a result with just one column.

Thank you very much!

Are you sure the desired output you posted is correct? Especially the first two rows — UJIN
– UJIN, Commented Apr 10, 2020 at 13:36

anky · Accepted Answer · 2020-04-10 13:44:33Z

2

You can filter column names starting with v and create a list and pass them under groupby:

v_cols = z.columns[z.columns.str.startswith('v')].tolist()

z[['u']+v_cols] = z[['u']+v_cols].apply(pd.to_numeric)

out = z.assign(**z.groupby(['id1','id2'])[v_cols].cumprod()
                .mul(z['u'],axis=0).add_prefix('new_'))
print(out)

  id1 id2   u  v0  v1  new_v0  new_v1
0   A   A  10   1   7      10      70
1   A   A  10   2   8      20     560
2   A   B  20   3   9      60     180
3   B   A  10   4  10      40     100
4   B   B  30   5  11     150     330
5   B   B  30   6  12     900    3960

answered Apr 10, 2020 at 13:44

anky

75.3k11 gold badges46 silver badges76 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Luca Monno Over a year ago

Thanks anky. What's the purpose of the ** in the assign?

anky Over a year ago

@LucaMonno it is a keyword arguments or kwargs , you can read more here basically this treats the dataframe as dictionary where the keys are unpacked as columns and the values as values of those columns

BENY · Accepted Answer · 2020-04-10 13:34:02Z

2

The way you create your data frame , will make the numeric to object , we convert first , then use the groupby+ cumprod

z[['u','v0','v1']]=z[['u','v0','v1']].apply(pd.to_numeric)
s=z.groupby(['id1','id2'])[['v0','v1']].cumprod().mul(z['u'],0)
#z=z.join(s.add_prefix('New_'))
    v0    v1
0   10    70
1   20   560
2   60   180
3   40   100
4  150   330
5  900  3960

answered Apr 10, 2020 at 13:34

BENY

324k22 gold badges176 silver badges250 bronze badges

Comments

Allen Qin · Accepted Answer · 2020-04-10 13:47:59Z

0

If you want to handle more than 2 v columns, it's better not to reference it.

(
    z.apply(lambda x: pd.to_numeric(x, errors='ignore'))
    .groupby(['id1', 'id2']).apply(lambda x: x.cumprod().mul(x.u.min()))
)

edited Apr 10, 2020 at 13:47

answered Apr 10, 2020 at 13:40

Allen Qin

20k9 gold badges55 silver badges68 bronze badges

Collectives™ on Stack Overflow

pandas apply User defined function to grouped dataframe on multiple columns

3 Answers 3

2 Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related