Checking if there is date skip in data

Question

I have a dataset and its indexes consist of timestamps. It's pandas series just like below:

Time                           
2013-09-17 22:08:11           0
2013-09-17 22:08:18           0
2013-09-17 22:08:26           0
2013-09-17 22:08:34           0
2013-09-17 22:08:42           0
2013-09-17 22:08:50           0
2013-09-17 22:08:58           0
2013-09-17 22:09:06           0
2013-09-17 22:09:11           0
2013-09-17 22:09:13           0
2013-09-17 22:09:19           0
2013-09-17 22:09:21           0
2013-09-17 22:09:27           0
2013-09-17 22:09:35           0
2013-09-17 22:09:43           0
Name: dummy_frame, dtype: float64

Data are recorded irregularly regarding to timestamps. Now what I want to do is to check this data, if there is date skip or jump inside it, such as from 2013-09-07 to 2013-12-22. I can do it simply with check first and last date and compare them relatively. However, I need to find where this jump occurs. Is there any easy way to find it out?

Thank you.

Bruno Mello · Accepted Answer · 2020-04-06 20:03:09Z

1

IIUC:

x = #your series
x.index = pd.to_datetime(x.index)
jumps = x.index.dt.date - x.index.shift(1).dt.date

This will create a series where jump[i] is the difference between jump[i] and jump[i-1] if you want to find where jump>1, just do:

x[jump>1]

edited Apr 6, 2020 at 20:03

answered Apr 6, 2020 at 18:19

Bruno Mello

4,6781 gold badge16 silver badges46 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

EnriqueBet Over a year ago

Hi, I was wondering if in your approach shift(1) will shift the date in seconds instead of days? I am curious because the datetime index contains seconds as well.

ICHaLiL Over a year ago

I got an error when I use x.index.dt.date command like "AttributeError: 'DatetimeIndex' object has no attribute 'dt' "

Bruno Mello Over a year ago

@EnriqueBet shift(1) doesn't change the values of the series, it shifts the series down by some amount pandas.pydata.org/pandas-docs/stable/reference/api/…

ICHaLiL · Accepted Answer · 2020-04-06 19:54:38Z

0

I believe you could simply create a data range with the same date format and compare both lists:

from datetime import datetime,timedelta

start_date = datetime.strptime("2013-09-07","%Y-%m-%d")
end_date = datetime.strptime("2013-12-22","%Y-%m-%d")

# This will create a list with complete dates
completeDates = [start_date + timedelta(days=x) for x in range(0,(end_dat-start_date ).days + 1)]
completeDates = [d.strftime("%Y-%m-%d") for d in completeDates] # Convert date to string

# Get your list from data frame index, and remove hours
myDates = dummy_frame.index.tolist()

# Is possible that your dates are in datetime obj or in string
# If string
myDates = [d.split()[0] for d in myDates]
# If date
myDates = [d.strftime("%Y-%m-%d") for d in myDates]

# Creates a list with missing data
missingDates = [d for d in completeDates if d not in myDates]

In this sense missingDates will be a list contaning all the missing dates or jumps from your data frame. Please let me know if this helps!

edited Apr 6, 2020 at 19:54

ICHaLiL

293 bronze badges

answered Apr 6, 2020 at 18:14

EnriqueBet

1,4742 gold badges15 silver badges23 bronze badges

2 Comments

ICHaLiL Over a year ago

Thank you for your detailed answer. Nice approach and it has worked for me.

EnriqueBet Over a year ago

You are welcome, if you feel like it, you could select this approach as your preferred answer, so more people will be able to find it.

Collectives™ on Stack Overflow

Checking if there is date skip in data

2 Answers 2

3 Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related