0

How to create a new column in a multi index dataframe which has the dates in each row representing the dates from the reference time. for eg.

THE REFRENCE TIME IS 10:30am everyday.

delta_minutes column represent how much time is before the REFERENCE TIME? For eg. the row where the delta_mins=20 for date 2019-12-2 means that the actual time for that particular row is 10:10am 2019-12-2 which is 20 minutes behind the reference time

I want to add another column which is an indication of what actual time that row indicates so that i can model it in a time series df1 is the actual df while the new datframe that i want to generate looks like df2

df1

date      delta_mins   sold_before
2019-12-2   20            a1  
            30            a2  
            40            a3    
            50            a4  
            60            a5  
2019-12-3   20            d1  
            30            d2  
            40            d3  
            50            d4  
            60            d5  
2019-12-4   20            g1  
            30            g2  
            40            g3  
            50            g4  
            60            g5  

REFERENCE TIME = 10:30 am

df2

date      delta_mins   sold_before     actual_time
2019-12     20            a1          2019-12-2 10:10
            30            a2          2019-12-2 10:00
            40            a3          2019-12-2 09:50
            50            a4          2019-12-2 09:40
            60            a5          2019-12-2 09:30          
2019-12-3   20            d1          2019-12-3 10:10  
            30            d2          2019-12-3 10:00
            40            d3          2019-12-3 09:50
            50            d4          2019-12-3 09:40
            60            d5          2019-12-3 09:30
2019-12-4   20            g1          2019-12-4 10:10
            30            g2          2019-12-4 10:00
            40            g3          2019-12-4 09:50
            50            g4          2019-12-4 09:40
            60            g5          2019-12-4 09:30
5
  • Isn’t this just a difference and a sum? Commented Dec 18, 2019 at 1:57
  • But it is in a multiindex df, and i cant get my head around how to do using multiindex Commented Dec 18, 2019 at 2:20
  • Is the df1 a result of a groupby() function? Commented Dec 18, 2019 at 6:17
  • yes @davidbilla Commented Dec 18, 2019 at 7:48
  • @VishalAnand I was caught up in some other stuff, see my answer below. Hope this helps. Let me know if I missed anything Commented Dec 20, 2019 at 10:43

1 Answer 1

1

You can use reset_index() and apply your operation

df1 = df1.reset_index(drop=True)
df1['actual_time'] = pd.to_datetime(df1['date'] + ' 10:30:00') - df1['delta_mins'].apply(lambda x:  pd.Timedelta(minutes=x))

if df1['date'] is of dtype datetime64 then convert it to string before concate

df1['actual_time'] = pd.to_datetime(df1['date'].dt.strftime('%Y-%m-%d') + ' 10:30:00') - df1['delta_mins'].apply(lambda x:  pd.Timedelta(minutes=x))

Output:

         date  delta_mins sold_before         actual_time
0   2019-12-2          20          a1 2019-12-02 10:10:00
1   2019-12-2          30          a2 2019-12-02 10:00:00
2   2019-12-2          40          a3 2019-12-02 09:50:00
3   2019-12-2          50          a4 2019-12-02 09:40:00
4   2019-12-2          60          a5 2019-12-02 09:30:00
5   2019-12-3          20          d1 2019-12-03 10:10:00
6   2019-12-3          30          d2 2019-12-03 10:00:00
7   2019-12-3          40          d3 2019-12-03 09:50:00
8   2019-12-3          50          d4 2019-12-03 09:40:00
9   2019-12-3          60          d5 2019-12-03 09:30:00
10  2019-12-4          20          g1 2019-12-04 10:10:00
11  2019-12-4          30          g2 2019-12-04 10:00:00
12  2019-12-4          40          g3 2019-12-04 09:50:00
13  2019-12-4          50          g4 2019-12-04 09:40:00
14  2019-12-4          60          g5 2019-12-04 09:30:00

Once you have it then use your df1.groupby() statement to group the data again if you need it.

df1 = df1.groupby(['date', 'delta_mins', 'sold_before']).agg(lambda x: x) # replace this with your groupby() statement

Output:

                                         actual_time
date      delta_mins sold_before                    
2019-12-2 20         a1          2019-12-02 10:10:00
          30         a2          2019-12-02 10:00:00
          40         a3          2019-12-02 09:50:00
          50         a4          2019-12-02 09:40:00
          60         a5          2019-12-02 09:30:00
2019-12-3 20         d1          2019-12-03 10:10:00
          30         d2          2019-12-03 10:00:00
          40         d3          2019-12-03 09:50:00
          50         d4          2019-12-03 09:40:00
          60         d5          2019-12-03 09:30:00
2019-12-4 20         g1          2019-12-04 10:10:00
          30         g2          2019-12-04 10:00:00
          40         g3          2019-12-04 09:50:00
          50         g4          2019-12-04 09:40:00
          60         g5          2019-12-04 09:30:00

OR:

You can use .transform() in your groupby() statement to expand the date column to all the rows and then apply the operation for 'actual_time'

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.