Is there a pandas way to do that:
predicted_sells = []
for row in df.values:
index_tms = row[0]
delta = index_tms + timedelta(hours=1)
try:
sells_to_predict = df.loc[delta]['cars_sold']
except KeyError:
new_element = None
predicted_sells.append(sells_to_predict)
df['sell_to_predict'] = predicted_sells
example explanation:
sell is the number of cars I sold at the time tms. sell_to_predict is the number of cars I sold the hour after. I want to predict that. So I want to build a new column containing at the time tms the number of cars I will sell at the time tms+1h
before my code it looks like that
tms sell
2015-11-23 15:00:00 6
2015-11-23 16:00:00 2
2015-11-23 17:00:00 10
after it looks like that
tms sell sell_to_predict
2015-11-23 15:00:00 6 2
2015-11-23 16:00:00 2 10
2015-11-23 17:00:00 10 NaN
I create a new column based on a shift of an other column, but that's not a shift in number of columns. That's a shift based on an index (here the index is a timestamp)
Here is an other example, little more complex :
before :
sell random
store hour
1 1 1 9
2 7 7
2 1 4 3
2 2 3
after :
sell random predict
store hour
1 1 1 9 7
2 7 7 NaN
2 1 4 3 2
2 2 3 NaN
index_tmsand['old_column]actually represent. For instance why would the following not work?df['new_column'] = df.index + timedelta(hours=1)