1

I have a df with DateTime index as follows:

DateTime
2017-01-02 15:00:00
2017-01-02 16:00:00
2017-01-02 18:00:00
....
....
2019-12-07 22:00:00
2019-12-07 23:00:00

Now, I want to know is there any time missing in the 1-hour interval. So, for instance, the 3rd reading is missing 1 reading as we went from 16:00 to 18:00 so is it possible to detect this?

2 Answers 2

4

Create date_range with minimal and maximal datetime and filter values by Index.isin with boolean indexing with ~ for inverting mask:

print (df)
             DateTime
0 2017-01-02 15:00:00
1 2017-01-02 16:00:00
2 2017-01-02 18:00:00


r = pd.date_range(df['DateTime'].min(), df['DateTime'].max(), freq='H')
print (r)
DatetimeIndex(['2017-01-02 15:00:00', '2017-01-02 16:00:00',
               '2017-01-02 17:00:00', '2017-01-02 18:00:00'],
              dtype='datetime64[ns]', freq='H')

out = r[~r.isin(df['DateTime'])]
print (out)
DatetimeIndex(['2017-01-02 17:00:00'], dtype='datetime64[ns]', freq='H')

Another idea is create DatetimeIndex with helper column, change frequency by Series.asfreq and filter index values with missing values:

s = df[['DateTime']].assign(val=1).set_index('DateTime')['val'].asfreq('H')
print (s)
DateTime
2017-01-02 15:00:00    1.0
2017-01-02 16:00:00    1.0
2017-01-02 17:00:00    NaN
2017-01-02 18:00:00    1.0
Freq: H, Name: val, dtype: float64

out = s.index[s.isna()]
print (out)
DatetimeIndex(['2017-01-02 17:00:00'], dtype='datetime64[ns]', name='DateTime', freq='H')
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks, it worked. Had to change to df.index.min() and so on because DateTime was index column.
0

Is it safe to assume that the datetime format will always be the same? If yes, why don't you extract the "hour" values from your respective timestamps and compare them to the interval you desire, e.g:

import re

#store some datetime values for show
datetimes=[
"2017-01-02 15:00:00",
"2017-01-02 16:00:00",
"2017-01-02 18:00:00",
"2019-12-07 22:00:00",
"2019-12-07 23:00:00"
]

#extract hour value via regex (first match always is the hours in this format)
findHour = re.compile("\d{2}(?=\:)")
prevx = findHour.findall(datetimes[1])[0]

#simple comparison: compare to previous value, calculate difference, set previous value to current value
for x in datetimes[2:]:
    cmp = findHour.findall(x)[0]
    diff = int(cmp) - int(prevx)
    if diff > 1:
        print("Missing Timestamp(s) between {} and {} hours!".format(prevx, cmp))
    prevx = cmp

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.