groupby on multiple columns and applying various functions

Question

I am trying to trim some trade reports: Original report looks as :

AssetClass  Symbol  UnderlyingSymbol    Multiplier  Strike  Expiry  Put/Call    DateTime     
Quantity    TradePrice  Commission  Buy/Sell
OPT ADBE  200221C00385000   ADBE    100 385 20200221    C   20200218,114515 1   1.4 2.5 BUY
OPT ADBE  200221C00385000   ADBE    100 385 20200221    C   20200218,114515 2   1.31    4.5 BUY

I would like to aggregate it as follow:

AssetClass  Symbol  UnderlyingSymbol    Multiplier  Strike  Expiry  Put/Call    DateTime     
Quantity    TradePrice  Commission  Buy/Sell
OPT ADBE  200221C00385000   ADBE    100 385 20200221    C   20200218,114515 3   1.34    7   BUY

So a groupby on columns Symbol and Buy/Sell, with a sum function applied on Quantity and Commission and a weighted average on column TradePrice.

df = pd.read_csv(filename)
wm = lambda x: np.average(x, weights=df.loc[x.index, "Quantity"])
f = {'Quantity': 'sum', 'Commission': 'sum'}
df.groupby(['Symbol', 'Buy/Sell']).agg(f)

I have multiple issues

the output "forgets" the other columns and if I add these columns in the groupby, I get some blanks here and there
how can I apply the function wm to the TradePrice column?
for the DateTime column (format is "yyymmdd , hhmmss"), I would like to get just the date (which is the same for all rows)

Here is an output when I add the AssetClass column for instance:

                                               Quantity  Commission
AssetClass Symbol                Buy/Sell                                     
OPT        ACN   200221P00212500 SELL            -3      0.003649
           ACN   200320C00215000 BUY              9     -6.694200
           ACN   200320P00215000 BUY              9     -6.694200
           XYZ  200221C00385000 BUY              2     -1.677600
                                 SELL            -4     -1.794891

jezrael · Accepted Answer · 2020-02-23 09:02:21Z

For remove times ffrom columnDatetime use Series.str.split:

df['DateTime'] = df['DateTime'].str.split(',').str[0]

For add new function add it to dictionary like another functions:

wm = lambda x: np.average(x, weights=df.loc[x.index, "Quantity"])
f = {'Quantity': 'sum', 'Commission': 'sum', 'TradePrice':wm}

Last if need avoid lost columns and same values per groups of Symbol and Buy/Sell columns is possible add it to groupby:

cols = ['AssetClass', 'Symbol', 'UnderlyingSymbol', 'Multiplier', 'Strike',
         'Expiry', 'Put/Call', 'DateTime', 'Buy/Sell']
df1 = df.groupby(cols).agg(f).reset_index()
print (df1)
  AssetClass                Symbol UnderlyingSymbol  Multiplier  Strike  \
0        OPT  ADBE 200221C00385000             ADBE         100     385   

     Expiry Put/Call  DateTime Buy/Sell  Quantity  Commission  TradePrice  
0  20200221        C  20200218      BUY         3         7.0        1.34

If columns names are not same per groups of Symbol and Buy/Sell columns is necessary specify for each column aggregate function and add to dictionary e.g. for AssetClass is added first and for Multiplier is used mean:

df['DateTime'] = df['DateTime'].str.split(',').str[0]

wm = lambda x: np.average(x, weights=df.loc[x.index, "Quantity"])

f = {'Quantity': 'sum', 
     'Commission': 'sum', 
     'TradePrice':wm, 
     'AssetClass':'first', 
     'Multiplier':'mean', ....}

df2 = df.groupby(['Symbol', 'Buy/Sell']).agg(f).reset_index()
print (df2)
                 Symbol Buy/Sell  Quantity  Commission  TradePrice AssetClass  \
0  ADBE 200221C00385000      BUY         3         7.0        1.34        OPT   

   Multiplier  
0         100

Collectives™ on Stack Overflow

groupby on multiple columns and applying various functions

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related