Speed up nested for loops on a dataframe

Question

I have a df called 'Series', consisting of over 2600 rows and 120 columns. Here is an extract:

         Date  Gasoil  Gasoline     Oil     Gas
0  2010-12-31  100.00    100.00  100.00  100.00
1  2011-01-03  103.97     99.88  100.18  105.55
2  2011-01-04  100.85     99.33   97.81  106.00
3  2011-01-05  102.02    100.61   98.82  101.54

I have created several empty dataframes, with the same column names and index as "Series". Each of these copy dfs need to accomodate some kind of function on the original Series df (moving averages, rolling percentiles, etc).

For example, one of these copy dataframes is called "log_returns". In every cell of log_returns, I need to calculate logarithmic returns, based on the corresponding column of the "Series" dataframe.

This is the output I have in mind. For example, log return of Gasoil on 2011-01-03 = log (103.97/100).

         Date  Gasoil Gasoline     Oil     Gas
0  2010-12-31                                 
1  2011-01-03   3.89%   -0.12%   0.18%   5.40%
2  2011-01-04  -3.05%   -0.55%  -2.39%   0.43%
3  2011-01-05   1.15%    1.28%   1.03%  -4.30%

In order to do that, I wrote a nested for loop:

rows_list = list(range(1, len(log_returns)))
columns_list = list(range(0, len(log_returns.columns)))

for row in rows_list:
    for column in columns_list:
        log_returns.iloc[row,column] = np.log(Series.iloc[row,column] / Series.iloc[row-1,column])

Unfortunately, the code is really slow to run. Are there faster alternatives? Thanks

HMReliable · Accepted Answer · 2021-08-16 21:16:18Z

2

After setting the Date as the index, the result can be obtained with a one-liner:

df = pd.DataFrame.from_records(
[(pd.to_datetime('2010-12-31'),  100.00,    100.00,  100.00,  100.00),
(pd.to_datetime('2011-01-03'),  103.97,     99.88,  100.18,  105.55),
(pd.to_datetime('2011-01-04'),  100.85,     99.33,   97.81,  106.00),
(pd.to_datetime('2011-01-05'),  102.02,    100.61,   98.82,  101.54)],
columns = ['Date',  'Gasoil',  'Gasoline',     'Oil',     'Gas'])
df = df.set_index('Date')
#this is the one-liner
np.log(df/df.shift(1))*100

answered Aug 16, 2021 at 21:16

HMReliable

8935 silver badges11 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Andrej Kesely · Accepted Answer · 2021-08-16 21:04:22Z

0

Try:

df.loc[:, "Gasoil":] = (
    np.log(df.loc[:, "Gasoil":] / df.loc[:, "Gasoil":].shift(-1)).shift() * -100
).fillna("")
print(df)

Prints:

         Date    Gasoil  Gasoline       Oil       Gas
0  2010-12-31                                        
1  2011-01-03  3.893221 -0.120072  0.179838  5.401459
2  2011-01-04 -3.046813 -0.552183 -2.394175  0.425432
3  2011-01-05  1.153461  1.280402  1.027319 -4.298628

answered Aug 16, 2021 at 21:04

Andrej Kesely

196k15 gold badges60 silver badges105 bronze badges

Collectives™ on Stack Overflow

Speed up nested for loops on a dataframe

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related