1

I have a dataframe with a column named DateTime with datetime values populated every 5 seconds. But few rows are missing which can be identified by seeing time difference between previous and current row. I want to insert the missing rows and populate other column with previous row values.

My Sample dataframe is like below:

           DateTime       Price
2022-03-04 09:15:00    34526.00
2022-03-04 09:15:05    34487.00
2022-03-04 09:15:10    34470.00
2022-03-04 09:15:20    34466.00
2022-03-04 09:15:45    34448.00

Result dataframe as below:

           DateTime       Price
2022-03-04 09:15:00    34526.00
2022-03-04 09:15:05    34487.00
2022-03-04 09:15:10    34470.00
2022-03-04 09:15:15    34470.00 <----Insert Row and keep Price same as previous row
2022-03-04 09:15:20    34466.00
2022-03-04 09:15:25    34466.00 <----Insert Row and keep Price same as previous row
2022-03-04 09:15:30    34466.00 <----Insert Row and keep Price same as previous row
2022-03-04 09:15:35    34466.00 <----Insert Row and keep Price same as previous row
2022-03-04 09:15:40    34466.00 <----Insert Row and keep Price same as previous row
2022-03-04 09:15:45    34448.00

4 Answers 4

3

An alternative, using an outer join:

t = pd.date_range(df.DateTime.min(), df.DateTime.max(), freq="5s", name="DateTime")
pd.merge(pd.DataFrame(t), df, how="outer").ffill()

Output:

Out[3]:
             DateTime    Price
0 2022-03-04 09:15:00  34526.0
1 2022-03-04 09:15:05  34487.0
2 2022-03-04 09:15:10  34470.0
3 2022-03-04 09:15:15  34470.0
4 2022-03-04 09:15:20  34466.0
5 2022-03-04 09:15:25  34466.0
6 2022-03-04 09:15:30  34466.0
7 2022-03-04 09:15:35  34466.0
8 2022-03-04 09:15:40  34466.0
9 2022-03-04 09:15:45  34448.0
Sign up to request clarification or add additional context in comments.

Comments

3

Try resample then ffill:

df['DateTime'] = pd.to_datetime(df['DateTime']) # change to datetime dtype
df = df.set_index('DateTime')                   # move DateTime into index 

df_out = df.resample('5S').ffill()              # resample 5 secs and forward fill

Output:

                       Price
DateTime                    
2022-03-04 09:15:00  34526.0
2022-03-04 09:15:05  34487.0
2022-03-04 09:15:10  34470.0
2022-03-04 09:15:15  34470.0
2022-03-04 09:15:20  34466.0
2022-03-04 09:15:25  34466.0
2022-03-04 09:15:30  34466.0
2022-03-04 09:15:35  34466.0
2022-03-04 09:15:40  34466.0
2022-03-04 09:15:45  34448.0

Comments

2

pandas asfreq method suffices for this :

(df
.set_index("DateTime")
.asfreq(freq="5S", method="ffill")
.reset_index()
)
             DateTime    Price
0 2022-03-04 09:15:00  34526.0
1 2022-03-04 09:15:05  34487.0
2 2022-03-04 09:15:10  34470.0
3 2022-03-04 09:15:15  34470.0
4 2022-03-04 09:15:20  34466.0
5 2022-03-04 09:15:25  34466.0
6 2022-03-04 09:15:30  34466.0
7 2022-03-04 09:15:35  34466.0
8 2022-03-04 09:15:40  34466.0
9 2022-03-04 09:15:45  34448.0

Comments

0

Another option:

  1. Create a new dataframe with the range of dates you want

    df_2 = pd.DataFrame({
        "DateTime": pd.date_range(start=df.loc[0, "DateTime"], end=df.loc[len(df.index)-1, "DateTime"], freq="5s")
    })
    
  2. Merge the new and the original dataframe using outer join

    df = pd.merge(df, df_2, how="outer").sort_values("DateTime")
    
  3. Fill empty values using .fillna(method="ffill")

    df.fillna(method="ffill")
    

Output:

             DateTime    Price
0 2022-03-04 09:15:00  34526.0
1 2022-03-04 09:15:05  34487.0
2 2022-03-04 09:15:10  34470.0
5 2022-03-04 09:15:15  34470.0
3 2022-03-04 09:15:20  34466.0
6 2022-03-04 09:15:25  34466.0
7 2022-03-04 09:15:30  34466.0
8 2022-03-04 09:15:35  34466.0
9 2022-03-04 09:15:40  34466.0
4 2022-03-04 09:15:45  34448.0

Resulting code:

df_2 = pd.DataFrame({
    "DateTime": pd.date_range(start=df.loc[0, "DateTime"], end=df.loc[len(df.index)-1, "DateTime"], freq="5s")
})
df = pd.merge(df, df_2, how="outer").sort_values("DateTime")
df = df.fillna(method="ffill")

print(df)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.