2

I would like to apply a function f1 by group to a dataframe:

import pandas as pd 
import numpy as np
data = np.array([['id1','id2','u','v0','v1'],
                ['A','A',10,1,7],
                ['A','A',10,2,8],
                ['A','B',20,3,9],
                ['B','A',10,4,10],
                ['B','B',30,5,11],
                ['B','B',30,6,12]])
z = pd.DataFrame(data = data[1:,:], columns=data[0,:])

def f1(u,v):
    return u*np.cumprod(v)

The result of the function depends on the column u and columns v0 or v1 (that can be thousands of v ecause I'm doing a simulation on a lot of paths).

The result should be like this

    id1 id2 new_v0  new_v1
0   A   A   10  70
1   A   A   20  560
2   A   B   60  180
3   B   A   40  100
4   B   B   150 330
5   B   B   900 3960

I tried for a start output = z.groupby(['id1', 'id2']).apply(lambda x: f1(u = x.u,v =x.v0)) but I can't even get a result with just one column.

Thank you very much!

2
  • 1
    Are you sure the desired output you posted is correct? Especially the first two rows Commented Apr 10, 2020 at 13:36
  • tks I've corrected it Commented Apr 11, 2020 at 15:37

3 Answers 3

2

You can filter column names starting with v and create a list and pass them under groupby:

v_cols = z.columns[z.columns.str.startswith('v')].tolist()

z[['u']+v_cols] = z[['u']+v_cols].apply(pd.to_numeric)

out = z.assign(**z.groupby(['id1','id2'])[v_cols].cumprod()
                .mul(z['u'],axis=0).add_prefix('new_'))
print(out)

  id1 id2   u  v0  v1  new_v0  new_v1
0   A   A  10   1   7      10      70
1   A   A  10   2   8      20     560
2   A   B  20   3   9      60     180
3   B   A  10   4  10      40     100
4   B   B  30   5  11     150     330
5   B   B  30   6  12     900    3960
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks anky. What's the purpose of the ** in the assign?
@LucaMonno it is a keyword arguments or kwargs , you can read more here basically this treats the dataframe as dictionary where the keys are unpacked as columns and the values as values of those columns
2

The way you create your data frame , will make the numeric to object , we convert first , then use the groupby+ cumprod

z[['u','v0','v1']]=z[['u','v0','v1']].apply(pd.to_numeric)
s=z.groupby(['id1','id2'])[['v0','v1']].cumprod().mul(z['u'],0)
#z=z.join(s.add_prefix('New_'))
    v0    v1
0   10    70
1   20   560
2   60   180
3   40   100
4  150   330
5  900  3960

Comments

0

If you want to handle more than 2 v columns, it's better not to reference it.

(
    z.apply(lambda x: pd.to_numeric(x, errors='ignore'))
    .groupby(['id1', 'id2']).apply(lambda x: x.cumprod().mul(x.u.min()))
)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.