1

Let´s say I´ve the given dataframe:

import pandas as pd

import numpy as np

# Create the dataframe:
data = {'date'              : ['2022-01-01', '2022-01-02', '2022-01-02', '2022-01-03', '2022-01-03', '2022-01-03', '2022-01-03', '2022-01-03', '2022-01-04'],
        'ticker'            : ['TSLA','TSLA', 'TSLA', 'TSLA', 'TSLA', 'TSLA', 'TSLA', 'AAPL', 'AAPL'],
        'type'              : ['buy', 'buy', 'split', 'sell', 'buy', 'sell', 'buy', 'buy', 'buy'],
        'units'             : [2, 2, 3, 2, 2, 2, 2, 2.5, 4],
        'price'             : [50.5, 50.5, 0, 52, 35, 52, 35, 50.5, 50.5],
        'fees'              : [4, 4, 0, 2.5, 3, 2.5, 3, 1.5, 2],
}

df = pd.DataFrame(data)
df
Out[3]: 
         date ticker   type  units  price  fees
0  2022-01-01   TSLA    buy    2.0   50.5   4.0
1  2022-01-02   TSLA    buy    2.0   50.5   4.0
2  2022-01-02   TSLA  split    3.0    0.0   0.0
3  2022-01-03   TSLA   sell    2.0   52.0   2.5
4  2022-01-03   TSLA    buy    2.0   35.0   3.0
5  2022-01-03   TSLA   sell    2.0   52.0   2.5
6  2022-01-03   TSLA    buy    2.0   35.0   3.0
7  2022-01-03   AAPL    buy    2.5   50.5   1.5
8  2022-01-04   AAPL    buy    4.0   50.5   2.0

Whereas I calculate the following columns:

# Create the column transaction value

def calc_transaction_value(type_transaction, price_unit, transacted_units, fees):
    """
    Calculate the transaction value from three columns in a dataframe depending on the value of type.
    """
    if type_transaction == 'buy':
        return price_unit * transacted_units + fees # i.e. the transacted cost
    if type_transaction == 'sell':
        return price_unit * transacted_units - fees # i.e. the gross income of capital
    else:
        return np.nan # If other return filler (i.e return NaN)

df['transacted_value'] = df.apply(lambda x: calc_transaction_value(x['type'], x['price'], x['units'], x['fees']), axis=1).fillna(0)

df
Out[4]: 
         date ticker   type  units  price  fees  transacted_value
0  2022-01-01   TSLA    buy    2.0   50.5   4.0            105.00
1  2022-01-02   TSLA    buy    2.0   50.5   4.0            105.00
2  2022-01-02   TSLA  split    3.0    0.0   0.0              0.00
3  2022-01-03   TSLA   sell    2.0   52.0   2.5            101.50
4  2022-01-03   TSLA    buy    2.0   35.0   3.0             73.00
5  2022-01-03   TSLA   sell    2.0   52.0   2.5            101.50
6  2022-01-03   TSLA    buy    2.0   35.0   3.0             73.00
7  2022-01-03   AAPL    buy    2.5   50.5   1.5            127.75
8  2022-01-04   AAPL    buy    4.0   50.5   2.0            204.00

And:

# create the flow of units, depends on transaction type (buy or sell)
df["flow_units"] = df.apply(lambda x: -x["units"] if x["type"] == "sell" else x["units"], axis=1)

# Create the cml_units and prev_units column
df = df.groupby("ticker").apply(lambda grp: grp.assign(cml_units=grp["flow_units"].cumsum().abs(),
                                                           prev_units=grp["flow_units"].shift(1).cumsum().abs().fillna(0)))
df
Out[5]: 
         date ticker   type  units  price  fees  transacted_value  flow_units  \
0  2022-01-01   TSLA    buy    2.0   50.5   4.0            105.00         2.0   
1  2022-01-02   TSLA    buy    2.0   50.5   4.0            105.00         2.0   
2  2022-01-02   TSLA  split    3.0    0.0   0.0              0.00         3.0   
3  2022-01-03   TSLA   sell    2.0   52.0   2.5            101.50        -2.0   
4  2022-01-03   TSLA    buy    2.0   35.0   3.0             73.00         2.0   
5  2022-01-03   TSLA   sell    2.0   52.0   2.5            101.50        -2.0   
6  2022-01-03   TSLA    buy    2.0   35.0   3.0             73.00         2.0   
7  2022-01-03   AAPL    buy    2.5   50.5   1.5            127.75         2.5   
8  2022-01-04   AAPL    buy    4.0   50.5   2.0            204.00         4.0   

   cml_units  prev_units  
0        2.0         0.0  
1        4.0         2.0  
2        7.0         4.0  
3        5.0         7.0  
4        7.0         5.0  
5        5.0         7.0  
6        7.0         5.0  
7        2.5         0.0  
8        6.5         2.5  

Where the computation of columns for cml_units and prev_units is inspired by @krassowski in thread.

But here I would like to create the columns prev_costs, cml_costs and cost_transaction. The logic is as follows:

I want to for each row, get the ticker from column ticker, and:

  • if transaction type value is equal to buy in column type then compute the operation prev_costs + transacted_value and populate in column cml_costs.
    • Where prev_costs for that row is cml_costs but shifted one row up (if grouped by ticker).
  • if transaction type value is equal to sell then compute the operation prev_costs - cost_transaction and populate in cml_costs.
    • Where prev_costs for that row is cml_costs but shifted one row up (if grouped by ticker).
    • And where cost_transaction is the result of operation (for that row): (units/cml_units) * prev_costs
  • else return prev_costs and populate in cml_costs.

I expect the resulting dataframe as:

data_2 = {'prev_costs'           : [0, 105, 210, 210, 150, 223, 159.29, 0, 127.75],
          'cml_costs'            : [105, 210, 210, 150, 223, 159.29, 232.29, 127.75, 331.75],
          'cost_transaction'     : [0, 0, 0, 60, 0, 63.71, 0, 0, 0],
}

df2 = pd.DataFrame(data_2)

df_expected = pd.concat([df, df2], axis=1, join='inner')

df_expected
Out[6]: 
         date ticker   type  units  price  fees  transacted_value  flow_units  \
0  2022-01-01   TSLA    buy    2.0   50.5   4.0            105.00         2.0   
1  2022-01-02   TSLA    buy    2.0   50.5   4.0            105.00         2.0   
2  2022-01-02   TSLA  split    3.0    0.0   0.0              0.00         3.0   
3  2022-01-03   TSLA   sell    2.0   52.0   2.5            101.50        -2.0   
4  2022-01-03   TSLA    buy    2.0   35.0   3.0             73.00         2.0   
5  2022-01-03   TSLA   sell    2.0   52.0   2.5            101.50        -2.0   
6  2022-01-03   TSLA    buy    2.0   35.0   3.0             73.00         2.0   
7  2022-01-03   AAPL    buy    2.5   50.5   1.5            127.75         2.5   
8  2022-01-04   AAPL    buy    4.0   50.5   2.0            204.00         4.0   

   cml_units  prev_units  prev_costs  cml_costs  cost_transaction  
0        2.0         0.0        0.00     105.00              0.00  
1        4.0         2.0      105.00     210.00              0.00  
2        7.0         4.0      210.00     210.00              0.00  
3        5.0         7.0      210.00     150.00             60.00  
4        7.0         5.0      150.00     223.00              0.00  
5        5.0         7.0      223.00     159.29             63.71  
6        7.0         5.0      159.29     232.29              0.00  
7        2.5         0.0        0.00     127.75              0.00  
8        6.5         2.5      127.75     331.75              0.00  

I have made this work in google sheets, but can´t make this work in pandas and in desperate need for help. :)

Thanks in advance!

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.