Inserting row in pandas dataframe based on time using datetime index

Question

I have two dataframes as show below:

df1 = 
index                 value

2014-05-21 09:30:00    0.0       
2014-05-21 10:00:00   10.0
2014-05-21 10:30:00    3.0

.....

2017-07-10 22:30:00   18.3
2017-07-10 23:00:00    7.6
2017-07-10 23:30:00    0.0

the length is equal to 55037 , and the other one

df2 = 
index                 value

2014-05-21 09:00:00    0.0
2014-05-21 10:00:00   13.0

.....

2017-07-10 21:00:00    1.6
2017-07-10 22:00:00   32.1
2017-07-10 23:00:00    7.6

the length is equal to 27519, the two dataframe above are groupby with different time step, now I would like to match the time step in the beginning and end automatic.

For example, the "2014-05-21 09:00:00" is lack in df1, how can I write a function, put the df1 and it would check if the beginning and end matched df2, then return a new df1 which in the beginning and end equal to df2 and the value equal to zero.

so in the end the new df1 would as show below,

new_df1 = 
index                 value

2014-05-21 09:00:00    0.0 
2014-05-21 09:30:00    0.0       
2014-05-21 10:00:00   10.0
2014-05-21 10:30:00    3.0

 .....

2017-07-10 22:30:00   18.3
2017-07-10 23:00:00    7.6
2017-07-10 23:30:00    0.0

and length equal to 55038 (make sure it is two times large than df2)

Anyone have idea ? Thanks in adavance!

Try to make a minimal reproducible example. This looks like some sort of merge. — Anton vBR
– Anton vBR, Commented Mar 31, 2018 at 22:37

jezrael · Accepted Answer · 2018-04-01 06:36:18Z

1

If need add all rows by missing dates from df2.index need combine_first:

df = df1.combine_first(df2)
print (df)
                     value
2014-05-21 09:00:00    0.0 <- value not exist in df1, used df2 value
2014-05-21 09:30:00    0.0
2014-05-21 10:00:00   10.0 <-value exist in both df, used df1 value
2014-05-21 10:30:00    3.0
2017-07-10 21:00:00    1.6
2017-07-10 22:00:00   32.1
2017-07-10 22:30:00   18.3
2017-07-10 23:00:00    7.6
2017-07-10 23:30:00    0.0

answered Apr 1, 2018 at 6:36

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Chi Over a year ago

Thanks ! tt really helps :)

mirthbottle · Accepted Answer · 2018-04-01 06:23:10Z

1

You can select df1 where the index matches the beginning and end of df2.

df1.loc[beginning(df2):end(df2)]

You'll want to make sure that the items are really in order for df1 because it will return the locs between beginning and end in order. If the timestamp is a datetime object, it will correctly sort if you call df1.sort_index(). You will also be able to get the beginning and end of df2 by calling df2.index.min() and df2.index.min()

If the timestamps are strings, but you can be sure that the items are in order, you can still use the .loc to select. Otherwise you'll want to convert them to datetime objects.

answered Apr 1, 2018 at 6:23

mirthbottle

7321 gold badge7 silver badges20 bronze badges

Collectives™ on Stack Overflow

Inserting row in pandas dataframe based on time using datetime index

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related