0

My DataFrame is:

            Date        Open        High         Low       Close   Adj Close     Volume
5932  2016-08-18  218.339996  218.899994  218.210007  218.860001  207.483215   52989300
5933  2016-08-19  218.309998  218.750000  217.740005  218.539993  207.179825   75443000
5934  2016-08-22  218.259995  218.800003  217.830002  218.529999  207.170364   61368800
5935  2016-08-23  219.250000  219.600006  218.899994  218.970001  207.587479   53399200
5936  2016-08-24  218.800003  218.910004  217.360001  217.850006  206.525711   71728900
5937  2016-08-25  217.399994  218.190002  217.220001  217.699997  206.383514   69224800
5938  2016-08-26  217.919998  219.119995  216.250000  217.289993  205.994827  122506300
5939  2016-08-29  217.440002  218.669998  217.399994  218.360001  207.009201   68606100
5940  2016-08-30  218.259995  218.589996  217.350006  218.000000  206.667908   58114500
5941  2016-08-31  217.610001  217.750000  216.470001  217.380005  206.080124   85269500
5942  2016-09-01  217.369995  217.729996  216.029999  217.389999  206.089645   97844200
5943  2016-09-02  218.389999  218.869995  217.699997  218.369995  207.018692   79293900
5944  2016-09-06  218.699997  219.119995  217.860001  219.029999  207.644394   56702100
5945  2016-09-07  218.839996  219.220001  218.300003  219.009995  207.625412   76554900
5946  2016-09-08  218.619995  218.940002  218.149994  218.509995  207.151398   73011600
5947  2016-09-09  216.970001  217.029999  213.250000  213.279999  202.193268  221589100
5948  2016-09-12  212.389999  216.809998  212.309998  216.339996  205.094223  168110900
5949  2016-09-13  214.839996  215.149994  212.500000  213.229996  202.145859  182828800
5950  2016-09-14  213.289993  214.699997  212.500000  213.149994  202.070023  134185500
5951  2016-09-15  212.960007  215.729996  212.750000  215.279999  204.089294  134427900
5952  2016-09-16  213.479996  213.690002  212.570007  213.369995  203.300430  155236400

Currently, I'm doing this:

        state['open_price'] = lookback.Open.iloc[-1:].get_values()[0]

        for ind, row in lookback.reset_index().iterrows():
            if ind < self.LOOKBACK_DAYS:
                state['close_' + str(self.LOOKBACK_DAYS - ind)] = row.Close
                state['open_' + str(self.LOOKBACK_DAYS - ind)] = row.Open
                state['volume_' + str(self.LOOKBACK_DAYS - ind)] = row.Volume

But this is exceedingly slow. Is there some more vectorized way to do this?

I am trying to convert this to:

cash          1.000000e+05
num_shares    0.000000e+00
cost_basis    0.000000e+00
open_price    1.316900e+02
close_20      1.301100e+02
open_20       1.302600e+02
volume_20     4.670420e+07
close_19      1.302100e+02
open_19       1.299900e+02
volume_19     4.320920e+07
close_18      1.300200e+02
open_18       1.300300e+02
volume_18     3.252300e+07
close_17      1.292200e+02
open_17       1.299300e+02
volume_17     8.207990e+07
close_16      1.300300e+02
open_16       1.294100e+02
volume_16     6.150570e+07
close_15      1.298000e+02
open_15       1.301100e+02
volume_15     7.057170e+07
close_14      1.298300e+02
open_14       1.300200e+02
volume_14     6.292560e+07
close_13      1.297300e+02
open_13       1.300700e+02
volume_13     6.162470e+07
close_12      1.305600e+02
open_12       1.297300e+02
                  ...     
close_10      1.308700e+02
open_10       1.308500e+02
volume_10     5.790620e+07
close_9       1.295400e+02
open_9        1.310600e+02
volume_9      8.018090e+07
close_8       1.297400e+02
open_8        1.297400e+02
volume_8      4.149650e+07
close_7       1.286400e+02
open_7        1.298500e+02
volume_7      7.279940e+07
close_6       1.288800e+02
open_6        1.287700e+02
volume_6      4.303370e+07
close_5       1.287100e+02
open_5        1.285900e+02
volume_5      5.105180e+07
close_4       1.286600e+02
open_4        1.288300e+02
volume_4      6.416770e+07
close_3       1.307000e+02
open_3        1.289300e+02
volume_3      9.253180e+07
close_2       1.309500e+02
open_2        1.307500e+02
volume_2      8.726900e+07
close_1       1.311300e+02
open_1        1.310000e+02
volume_1      8.600550e+07
Length: 64, dtype: float64
4
  • Define "transform" please? Commented Apr 9, 2019 at 20:14
  • Transform isn't the right word. I guess convert it to a series? Commented Apr 9, 2019 at 20:15
  • 4
    It would be helpful to understand what the code is trying to do, and what the expected output is as well. Commented Apr 9, 2019 at 20:16
  • @coldspeed Updated post to reflect expected output Commented Apr 9, 2019 at 20:20

1 Answer 1

1

One way is to cheat and use the underlying arrays using .values

I'll add some steps that i took to create an equivalent example as well:

import pandas as pd
from itertools import product

initial = ['cash', 'num_shares', 'somethingsomething']
initial_series = pd.Series([1, 2, 3], index = initial)
print(initial_series)
#Output:
cash                  1
num_shares            2
somethingsomething    3
dtype: int64

Okay, just some values at the start of your series in output, mocked for the example.

df = pd.read_clipboard(sep='\s\s+') #pure magic
print(df.head())
#Output:
            Date        Open    ...      Adj Close    Volume
5932  2016-08-18  218.339996    ...     207.483215  52989300
5933  2016-08-19  218.309998    ...     207.179825  75443000
5934  2016-08-22  218.259995    ...     207.170364  61368800
5935  2016-08-23  219.250000    ...     207.587479  53399200
5936  2016-08-24  218.800003    ...     206.525711  71728900

[5 rows x 7 columns]

df is now essentially the dataframe you provided in the example. The clipboard trick comes from here and is a good read for pandas MCVEs.

to_select = ['Close', 'Open', 'Volume']
SOMELOOKBACK = 6000 #mocked
final_index = [f"{name}_{index}" for index, name in product((SOMELOOKBACK - df.index), to_select)]

This prepares the indexes and looks something like this

['Close_68',
 'Open_68',
 'Volume_68',
 'Close_67',
 'Open_67',
 'Volume_67',
...
]

Now, just select the relevant columns from dataframe, use .values to get a 2d array then flatten, to get the final series.

final_series = pd.Series(df[to_select].values.flatten(), index = final_index)

result = initial_series.append(final_series)
#Output:
cash                  1.000000e+00
num_shares            2.000000e+00
somethingsomething    3.000000e+00
Close_68              2.188600e+02
Open_68               2.183400e+02
Volume_68             5.298930e+07
Close_67              2.185400e+02
Open_67               2.183100e+02
Volume_67             7.544300e+07
Close_66              2.185300e+02
Open_66               2.182600e+02
Volume_66             6.136880e+07
...
Close_48              2.133700e+02
Open_48               2.134800e+02
Volume_48             1.552364e+08
Length: 66, dtype: float64
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.