How to calculate with previous values in a Pandas MultiIndex DataFrame?

Question

I have the following MultiIndex dataframe.

                         Close     ATR     
Date          Symbol     
1990-01-01    A          24        2       
1990-01-01    B          72        7      
1990-01-01    C          40        3.4 

1990-01-02    A          21        1.5     
1990-01-02    B          65        6        
1990-01-02    C          45        4.2   

1990-01-03    A          19        2.5    
1990-01-03    B          70        6.3       
1990-01-03    C          51        5

I want to calculate three columns:

Shares = previous day's Equity * 0.02 / ATR, rounded down to whole number
Profit = Shares * Close
Equity = previous day's Equity + sum of Profit for each Symbol

Equity has an initial value of 10,000.

The expected output is:

                         Close     ATR     Shares     Profit     Equity
Date          Symbol     
1990-01-01    A          24        2       0          0          10000
1990-01-01    B          72        7       0          0          10000
1990-01-01    C          40        3.4     0          0          10000

1990-01-02    A          21        1.5     133        2793       17053
1990-01-02    B          65        6       33         2145       17053
1990-01-02    C          45        4.2     47         2115       17053

1990-01-03    A          19        2.5     136        2584       26885
1990-01-03    B          70        6.3     54         3780       26885
1990-01-03    C          51        5       68         3468       26885

I suppose I need a for loop or a function to be applied to each row. With these I have two issues. One is that I'm not sure how I can create a for loop for this logic in case of a MultiIndex dataframe. The second is that my dataframe is pretty large (something like 10 million rows) so I'm not sure if a for loop would be a good idea. But then how can I create these columns?

rahlf23 · Accepted Answer · 2018-11-05 17:34:38Z

1

This solution can surely be cleaned up, but will produce your desired output. I've included your initial conditions in the construction of your sample dataframe:

import pandas as pd
import numpy as np

df = pd.DataFrame({'Date': ['1990-01-01','1990-01-01','1990-01-01','1990-01-02','1990-01-02','1990-01-02','1990-01-03','1990-01-03','1990-01-03'],
    'Symbol': ['A','B','C','A','B','C','A','B','C'],
    'Close': [24, 72, 40, 21, 65, 45, 19, 70, 51],
    'ATR': [2, 7, 3.4, 1.5, 6, 4.2, 2.5, 6.3, 5],
    'Shares': [0, 0, 0, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan],
    'Profit': [0, 0, 0, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan]})

Gives:

         Date Symbol  Close  ATR  Shares  Profit
0  1990-01-01      A     24  2.0     0.0     0.0
1  1990-01-01      B     72  7.0     0.0     0.0
2  1990-01-01      C     40  3.4     0.0     0.0
3  1990-01-02      A     21  1.5     NaN     NaN
4  1990-01-02      B     65  6.0     NaN     NaN
5  1990-01-02      C     45  4.2     NaN     NaN
6  1990-01-03      A     19  2.5     NaN     NaN
7  1990-01-03      B     70  6.3     NaN     NaN
8  1990-01-03      C     51  5.0     NaN     NaN

Then use groupby() with apply() and track your Equity globally. Took me a second to realize that the nature of this problem requires you to group on two separate columns individually (Symbol and Date):

start = 10000
Equity = 10000

def calcs(x):

    global Equity

    if x.index[0]==0: return x #Skip first group

    x['Shares'] = np.floor(Equity*0.02/x['ATR'])
    x['Profit'] = x['Shares']*x['Close']
    Equity += x['Profit'].sum()

    return x

df = df.groupby('Date').apply(calcs)
df['Equity'] = df.groupby('Date')['Profit'].transform('sum')
df['Equity'] = df.groupby('Symbol')['Equity'].cumsum()+start

This yields:

         Date Symbol  Close  ATR  Shares  Profit   Equity
0  1990-01-01      A     24  2.0     0.0     0.0  10000.0
1  1990-01-01      B     72  7.0     0.0     0.0  10000.0
2  1990-01-01      C     40  3.4     0.0     0.0  10000.0
3  1990-01-02      A     21  1.5   133.0  2793.0  17053.0
4  1990-01-02      B     65  6.0    33.0  2145.0  17053.0
5  1990-01-02      C     45  4.2    47.0  2115.0  17053.0
6  1990-01-03      A     19  2.5   136.0  2584.0  26885.0
7  1990-01-03      B     70  6.3    54.0  3780.0  26885.0
8  1990-01-03      C     51  5.0    68.0  3468.0  26885.0

edited Nov 5, 2018 at 17:34

answered Nov 5, 2018 at 17:27

rahlf23

9,0494 gold badges30 silver badges57 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

mmaarrttoonn Over a year ago

This works. I have to add that since I had a MultiIndex dataframe, I had to perform a .reset_index() on the dataframe. Furthermore, I got an error for the line x['Shares'] = np.floor(Equity*0.02/x['ATR']): AttributeError: 'float' object has no attribute 'floor'. I solved this by converting it to x['Shares'] = np.floor((Equity*0.02/x['ATR']).astype(float)). Thank you.

rahlf23 Over a year ago

Glad this worked for you, and I appreciate you taking the time to include the slight modifications you had to make to the solution!

mmaarrttoonn Over a year ago

What if I want to bound the calculation of Shares to a condition included in another column Cond, otherwise forward fill Shares values grouped by Symbol? Like if x[Cond] == True then np.floor((Equity*0.02/x['ATR']).astype(float)) else ffill() previous Shares value. I've tried to play around with the function you wrote but I'm struggling to access previous row's Shares value. Do you have any suggestion?

rahlf23 Over a year ago

This would be better served as a separate question. The reason I declared Equity as global was to be able to continuously query the previous value and then proceed to increment it during the current group iteration. You may need to do something similar, but posting as another question may get better visibility.

lfkopp · Accepted Answer · 2018-11-05 16:53:35Z

0

can you try using shift and groupby? Once you have the value of the previous line, all columns operations are straight forward.

table2['previous'] = table2['close'].groupby('symbol').shift(1)

table2

date    symbol      close   atr     previous

1990-01-01  A   24  2   NaN
            B   72  7   NaN
            C   40  3.4     NaN
1990-01-02  A   21  1.5     24
            B   65  6   72
            C   45  4.2     40
1990-01-03  A   19  2.5     21
            B   70  6.3     65
            C   51  5   45

answered Nov 5, 2018 at 16:53

lfkopp

1042 bronze badges

Collectives™ on Stack Overflow

How to calculate with previous values in a Pandas MultiIndex DataFrame?

2 Answers 2

4 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related