4

Here is my some Dataset which having Time,Temperature1,Temperature2

Timestamp.             Temperature1.        Temperature2
09/01/2016 00:00:08          53.4.                       45.5
09/01/2016 00:00:38.         53.5.                       45.2
09/01/2016 00:01:08.         54.6.                        43.2
09/01/2016 00:01:38.         55.2.                        46.3
09/01/2016 00:02:08.         54.5.                        45.5
09/01/2016 00:04:08.         54.2.                       35.5
09/01/2016 00:05:08.         52.4.                        45.7
09/01/2016 00:05:38.         53.4.                         45.2

My data is coming in every 30 second..

This is my dataset here..some Timestamp is missing..bcoz. every 30 second my data is coming..so some data points are missing.. How to find that data points..and insert there data as NAN ... Please help me..

2
  • can you post some code or tell us what you have tried? Commented Nov 4, 2016 at 9:25
  • You're looking for resample: pandas.pydata.org/pandas-docs/stable/… Commented Nov 4, 2016 at 10:33

2 Answers 2

5

you can use resample('30S', base=8) method:

In [20]: x.resample('30S', base=8).mean()
Out[20]:
                                 Temperature1    Temperature2
Timestamp
2016-09-01 00:00:08                      53.4            45.5
2016-09-01 00:00:38                      53.5            45.2
2016-09-01 00:01:08                      54.6            43.2
2016-09-01 00:01:38                      55.2            46.3
2016-09-01 00:02:08                      54.5            45.5
2016-09-01 00:02:38                       NaN             NaN
2016-09-01 00:03:08                       NaN             NaN
2016-09-01 00:03:38                       NaN             NaN
2016-09-01 00:04:08                      54.2            35.5
2016-09-01 00:04:38                       NaN             NaN
2016-09-01 00:05:08                      52.4            45.7
2016-09-01 00:05:38                      53.4            45.2

The solution above assumes that the Timestamp is of datetime dtype and that it has been set as index. If Timestamp is a regular column (not index), then starting from Pandas 0.19.0 we can resample on regular columns (it must be of datetime dtype), using on='column_name' parameter:

In [26]: x.resample('30S', on='Timestamp', base=8).mean()
Out[26]:
                                 Temperature1    Temperature2
Timestamp
2016-09-01 00:00:08                      53.4            45.5
2016-09-01 00:00:38                      53.5            45.2
2016-09-01 00:01:08                      54.6            43.2
2016-09-01 00:01:38                      55.2            46.3
2016-09-01 00:02:08                      54.5            45.5
2016-09-01 00:02:38                       NaN             NaN
2016-09-01 00:03:08                       NaN             NaN
2016-09-01 00:03:38                       NaN             NaN
2016-09-01 00:04:08                      54.2            35.5
2016-09-01 00:04:38                       NaN             NaN
2016-09-01 00:05:08                      52.4            45.7
2016-09-01 00:05:38                      53.4            45.2

if you need to find your base value dynamically you can do it this way:

In [21]: x.index[0].second
Out[21]: 8

from docs:

base : int, default 0

For frequencies that evenly subdivide 1 day, the “origin” of the aggregated intervals. For example, for 5min frequency, base could range from 0 through 4.

Defaults to 0

Sign up to request clarification or add additional context in comments.

Comments

3

Assuming that the timestamps have been converted to datetime, if you set the index to the timestamp column and then reindex with a date range then the missing values will show up:

In [94]:
df['Timestamp'] = pd.to_datetime(df['Timestamp'])
df = df.set_index('Timestamp')
df

Out[94]:
                     Temperature1  Temperature2
Timestamp                                      
2016-09-01 00:00:08          53.4          45.5
2016-09-01 00:00:38          53.5          45.2
2016-09-01 00:01:08          54.6          43.2
2016-09-01 00:01:38          55.2          46.3
2016-09-01 00:02:08          54.5          45.5
2016-09-01 00:04:08          54.2          35.5
2016-09-01 00:05:08          52.4          45.7
2016-09-01 00:05:38          53.4          45.2

In [96]:    
df.reindex(pd.date_range(start=df.index[0], end=df.index[-1], freq='30s'))

Out[96]:
                     Temperature1  Temperature2
2016-09-01 00:00:08          53.4          45.5
2016-09-01 00:00:38          53.5          45.2
2016-09-01 00:01:08          54.6          43.2
2016-09-01 00:01:38          55.2          46.3
2016-09-01 00:02:08          54.5          45.5
2016-09-01 00:02:38           NaN           NaN
2016-09-01 00:03:08           NaN           NaN
2016-09-01 00:03:38           NaN           NaN
2016-09-01 00:04:08          54.2          35.5
2016-09-01 00:04:38           NaN           NaN
2016-09-01 00:05:08          52.4          45.7
2016-09-01 00:05:38          53.4          45.2

This assumes that the timestamps are regular, here we construct a date range using the timestamp first and last values with a frequency of 30 seconds:

In [97]:
pd.date_range(start=df.index[0], end=df.index[-1], freq='30s')

Out[97]:
DatetimeIndex(['2016-09-01 00:00:08', '2016-09-01 00:00:38',
               '2016-09-01 00:01:08', '2016-09-01 00:01:38',
               '2016-09-01 00:02:08', '2016-09-01 00:02:38',
               '2016-09-01 00:03:08', '2016-09-01 00:03:38',
               '2016-09-01 00:04:08', '2016-09-01 00:04:38',
               '2016-09-01 00:05:08', '2016-09-01 00:05:38'],
              dtype='datetime64[ns]', freq='30S')

When you reindex with this, any missing index labels become NaN values

8 Comments

Thanks its really work...and if we are finding Avg so we can calculate Avg for missing data also..according to previous data ?
Sorry what do you mean? You want to fill in the missing values with the mean or you want mean to include the missing row values?
i want to calculate mean for missing value?
So you mean df.mean(skipna=False)?
one more question is arrive here i will post as a new question ..because some of my codes are not giving me proper output ..i am using linux.and same code is running properly in windows..
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.