0

I have a dataset and its indexes consist of timestamps. It's pandas series just like below:

Time                           
2013-09-17 22:08:11           0
2013-09-17 22:08:18           0
2013-09-17 22:08:26           0
2013-09-17 22:08:34           0
2013-09-17 22:08:42           0
2013-09-17 22:08:50           0
2013-09-17 22:08:58           0
2013-09-17 22:09:06           0
2013-09-17 22:09:11           0
2013-09-17 22:09:13           0
2013-09-17 22:09:19           0
2013-09-17 22:09:21           0
2013-09-17 22:09:27           0
2013-09-17 22:09:35           0
2013-09-17 22:09:43           0
Name: dummy_frame, dtype: float64

Data are recorded irregularly regarding to timestamps. Now what I want to do is to check this data, if there is date skip or jump inside it, such as from 2013-09-07 to 2013-12-22. I can do it simply with check first and last date and compare them relatively. However, I need to find where this jump occurs. Is there any easy way to find it out?

Thank you.

2 Answers 2

1

IIUC:

x = #your series
x.index = pd.to_datetime(x.index)
jumps = x.index.dt.date - x.index.shift(1).dt.date

This will create a series where jump[i] is the difference between jump[i] and jump[i-1] if you want to find where jump>1, just do:

x[jump>1]
Sign up to request clarification or add additional context in comments.

3 Comments

Hi, I was wondering if in your approach shift(1) will shift the date in seconds instead of days? I am curious because the datetime index contains seconds as well.
I got an error when I use x.index.dt.date command like "AttributeError: 'DatetimeIndex' object has no attribute 'dt' "
@EnriqueBet shift(1) doesn't change the values of the series, it shifts the series down by some amount pandas.pydata.org/pandas-docs/stable/reference/api/…
0

I believe you could simply create a data range with the same date format and compare both lists:

from datetime import datetime,timedelta

start_date = datetime.strptime("2013-09-07","%Y-%m-%d")
end_date = datetime.strptime("2013-12-22","%Y-%m-%d")

# This will create a list with complete dates
completeDates = [start_date + timedelta(days=x) for x in range(0,(end_dat-start_date ).days + 1)]
completeDates = [d.strftime("%Y-%m-%d") for d in completeDates] # Convert date to string

# Get your list from data frame index, and remove hours
myDates = dummy_frame.index.tolist()

# Is possible that your dates are in datetime obj or in string
# If string
myDates = [d.split()[0] for d in myDates]
# If date
myDates = [d.strftime("%Y-%m-%d") for d in myDates]

# Creates a list with missing data
missingDates = [d for d in completeDates if d not in myDates]

In this sense missingDates will be a list contaning all the missing dates or jumps from your data frame. Please let me know if this helps!

2 Comments

Thank you for your detailed answer. Nice approach and it has worked for me.
You are welcome, if you feel like it, you could select this approach as your preferred answer, so more people will be able to find it.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.