1

I have a sensor that measures data every ~60seconds. There is a little bit of delay between calls, so the data might look like this:

timestamp, value
12:01:45, 100
12:02:50, 90
12:03:55, 87
              # 12:04 missing
12:05:00, 91

I only need precision to the minute, not seconds. Since this gathers data all day long, there should be 1440 entries (1440 minutes per day), however, there are some missing timestamps.

I'm loading this into a pd.DataFrame, and I'd like to have 1440 rows no matter what. How could I squeeze in None values to any missing timestamps?

timestamp, value
12:01:45, 100
12:02:50, 90
12:03:55, 87
12:04:00, None  # Squeezed in a None value
12:05:00, 91

Additionally, some data is missing for several HOURS, but I'd still like to fill those with None.

Ultimately, I wish to plot the data using matplotlib, with the x-axis ranging between (0, 1440), and the y-axis ranging between (0, 100).

2
  • asfreq maybe? Commented Sep 20, 2021 at 4:38
  • 4
    1. You do not need to fill the missing data to plot with matplotlib, just make sure to use a timedelta/datetime type for x, 2. If you really want to fill, set timestamp as index and use reindex with a custom list of the times you want (check timedelta_range) Commented Sep 20, 2021 at 4:39

1 Answer 1

1

Use Resampler.first with Series.fillna if need replace only values between first and last timestamp:

df['timestamp'] = pd.to_datetime(df['timestamp'])
df = df.resample('1min', on='timestamp').first()
df['timestamp'] = df['timestamp'].fillna(df.index.to_series())
df = df.reset_index(drop=True)
print (df)
            timestamp  value
0 2021-09-20 12:01:45  100.0
1 2021-09-20 12:02:50   90.0
2 2021-09-20 12:03:55   87.0
3 2021-09-20 12:04:00    NaN
4 2021-09-20 12:05:00   91.0

If need all datetimes per day add DataFrame.reindex:

df['timestamp'] = pd.to_datetime(df['timestamp'])
df = df.resample('1min', on='timestamp').first()

rng = pd.date_range('00:00:00','23:59:00', freq='Min')
df = df.reindex(rng)
df['timestamp'] = df['timestamp'].fillna(df.index.to_series())

df = df.reset_index(drop=True)
print (df)
               timestamp  value
0    2021-09-20 00:00:00    NaN
1    2021-09-20 00:01:00    NaN
2    2021-09-20 00:02:00    NaN
3    2021-09-20 00:03:00    NaN
4    2021-09-20 00:04:00    NaN
                 ...    ...
1435 2021-09-20 23:55:00    NaN
1436 2021-09-20 23:56:00    NaN
1437 2021-09-20 23:57:00    NaN
1438 2021-09-20 23:58:00    NaN
1439 2021-09-20 23:59:00    NaN

[1440 rows x 2 columns]
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.