1

I have two dataframes as show below:

df1 = 
index                 value

2014-05-21 09:30:00    0.0       
2014-05-21 10:00:00   10.0
2014-05-21 10:30:00    3.0

.....

2017-07-10 22:30:00   18.3
2017-07-10 23:00:00    7.6
2017-07-10 23:30:00    0.0

the length is equal to 55037 , and the other one

df2 = 
index                 value

2014-05-21 09:00:00    0.0
2014-05-21 10:00:00   13.0

.....

2017-07-10 21:00:00    1.6
2017-07-10 22:00:00   32.1
2017-07-10 23:00:00    7.6

the length is equal to 27519, the two dataframe above are groupby with different time step, now I would like to match the time step in the beginning and end automatic.

For example, the "2014-05-21 09:00:00" is lack in df1, how can I write a function, put the df1 and it would check if the beginning and end matched df2, then return a new df1 which in the beginning and end equal to df2 and the value equal to zero.

so in the end the new df1 would as show below,

new_df1 = 
index                 value

2014-05-21 09:00:00    0.0 
2014-05-21 09:30:00    0.0       
2014-05-21 10:00:00   10.0
2014-05-21 10:30:00    3.0

 .....

2017-07-10 22:30:00   18.3
2017-07-10 23:00:00    7.6
2017-07-10 23:30:00    0.0

and length equal to 55038 (make sure it is two times large than df2)

Anyone have idea ? Thanks in adavance!

1

2 Answers 2

1

If need add all rows by missing dates from df2.index need combine_first:

df = df1.combine_first(df2)
print (df)
                     value
2014-05-21 09:00:00    0.0 <- value not exist in df1, used df2 value
2014-05-21 09:30:00    0.0
2014-05-21 10:00:00   10.0 <-value exist in both df, used df1 value
2014-05-21 10:30:00    3.0
2017-07-10 21:00:00    1.6
2017-07-10 22:00:00   32.1
2017-07-10 22:30:00   18.3
2017-07-10 23:00:00    7.6
2017-07-10 23:30:00    0.0
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks ! tt really helps :)
1

You can select df1 where the index matches the beginning and end of df2.

df1.loc[beginning(df2):end(df2)]

You'll want to make sure that the items are really in order for df1 because it will return the locs between beginning and end in order. If the timestamp is a datetime object, it will correctly sort if you call df1.sort_index(). You will also be able to get the beginning and end of df2 by calling df2.index.min() and df2.index.min()

If the timestamps are strings, but you can be sure that the items are in order, you can still use the .loc to select. Otherwise you'll want to convert them to datetime objects.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.