1

I have a large data set, a sample is given below. The data is recorded for 1 day with 5-min interval for 24 hours for 3214 unique ids.

  • The time and date information is missing in some rows as shown in the image. Is there a way to automatically fill the missing date and time for every unique_id? Every unique id has 288 datetime values (5 min * 12 hr * 24 hr = 288 values per day with 5 min increment in time interval)
  • Also, how to fill the missing unqiue ids with previous value?
  • I am trying to fill the last column VALUE with empty cells.

I searched for the relevant answers 1, 2 but it did not work.

The most relent answer was here, but the solutions are only for missing dates not for both date and times.

sample_data.csv

datetime    Unique_ID   VALUE
28/01/2018 2:05 105714  284
28/01/2018 2:10 105714  292
28/01/2018 2:15 105714  287
28/01/2018 2:20 105714  193
28/01/2018 2:25 105714  2348
28/01/2018 2:30 105714  284
28/01/2018 2:35 105714  286
28/01/2018 2:40 105714  293
28/01/2018 2:45 105714  229
28/01/2018 2:50 105714  2292
28/01/2018 2:55 105714  225
28/01/2018 4:45 105714  185
28/01/2018 4:50 105714  277
28/01/2018 4:55 105714  185
28/01/2018 5:00 105714  2435
28/01/2018 5:05 105714  273
28/01/2018 5:10 105714  2360
28/01/2018 5:15 105714  282

enter image description here

2
  • with what value would you like to fill the 'value' column Commented Apr 24, 2021 at 13:36
  • @Devakumarkp. With blank or empty cells like here stackoverflow.com/questions/31470551/… Commented Apr 24, 2021 at 14:58

1 Answer 1

1

Here in pandas you can try:

Firstly get the index of nan values:

idx=df[df.isna().all(1)].index

Then use ffill() method:

df[['Unique_ID','datetime']]=df[['Unique_ID','datetime']].ffill()

Finally you can use groupby() method and transform() method:

df['VALUE']=df.groupby(['Unique_ID','datetime'],dropna=False).transform('last')

Now if you want to increment 'datetime' column value by 5 mins which are filled by ffill() method then:

df.loc[idx,'datetime']=df.loc[idx,'datetime']+pd.to_timedelta(5,unit='m')
Sign up to request clarification or add additional context in comments.

4 Comments

Updated my answer kindly have a look :)
The above solution doesn,t worked for me as it depends on nan values. The data I am trying doesn,t have nan value or any blank spaces. Rather a missing information between the time intervals as shown here. i.sstatic.net/xNcb5.png
For instance, there are 22 values of datetime are missing in the above shared image. I am trying for a solution which will automatically fill the datetime value with 5 min increment for 24 hrs (288 values in total for 24 hr).
next time pls be more specific and clear when asking question as no where in the question you are mentioning that missing is not refferring to nan but the values between the 2 dates

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.