2

Is there a pandas way to do that:

predicted_sells = []
for row in df.values:
    index_tms = row[0]
    delta = index_tms + timedelta(hours=1)
    try:
        sells_to_predict = df.loc[delta]['cars_sold']
    except KeyError:
        new_element = None
    predicted_sells.append(sells_to_predict)


df['sell_to_predict'] = predicted_sells

example explanation:

sell is the number of cars I sold at the time tms. sell_to_predict is the number of cars I sold the hour after. I want to predict that. So I want to build a new column containing at the time tms the number of cars I will sell at the time tms+1h

before my code it looks like that

                tms  sell 
2015-11-23 15:00:00     6               
2015-11-23 16:00:00     2               
2015-11-23 17:00:00    10         

after it looks like that

                tms  sell  sell_to_predict
2015-11-23 15:00:00     6                2
2015-11-23 16:00:00     2               10
2015-11-23 17:00:00    10              NaN

I create a new column based on a shift of an other column, but that's not a shift in number of columns. That's a shift based on an index (here the index is a timestamp)

Here is an other example, little more complex :

before :

            sell  random
store hour              
1     1        1       9
      2        7       7
2     1        4       3
      2        2       3

after :

            sell  random  predict
store hour              
1     1        1       9        7
      2        7       7      NaN
2     1        4       3        2
      2        2       3      NaN
2
  • 2
    Can you provide a small example of the dataframe you would like to modify, and an example of what you are hoping to get out? From the example you provided it is unclear what index_tms and ['old_column] actually represent. For instance why would the following not work? df['new_column'] = df.index + timedelta(hours=1) Commented Nov 23, 2015 at 17:03
  • Imagine I want to predict the number of cars I will sell in one hour. I have in 'old_column' the number of cars I sold at the time I am using as an index. Then I want to for that precise time the number of cars sold one hour later, thus I want to create a 'new_column' containing the number of cars sold, but one hour later. I will edit my question in order to illustrate that. Commented Nov 23, 2015 at 17:11

2 Answers 2

2

have you tried shift?

e.g.

df = pd.DataFrame(list(range(4)))
df.columns = ['sold']
df['predict'] = df.sold.shift(-1)

df
   sold  predict
0     0        1
1     1        2
2     2        3
3     3      NaN
Sign up to request clarification or add additional context in comments.

1 Comment

this is not what I want, since it's not based on index comparison. the line before is not always one with an index one hour before. Moreover, I didn't say that in my question tho, I have one more index (say here the id of the car store), so the line before can reference a sell that happened in an other store.
2

the answer was to resample so I won't have any hole, and then apply the answer for this question : How do you shift Pandas DataFrame with a multiindex?

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.