I'm collecting time series data, but sometimes for some time points there is no data to be collected. Just say for example I am collecting data across four time points, I might get a dataframe like this:
df_ = pd.DataFrame({'group': ['A']*3+['B']*3,
'time': [1,2,4,1,3,4],
'value': [100,105,111,200,234,222]})
sometimes there is a datapoint missing and so there is no row for that point, I would like groupby and to forward fill with the previous value to create a new row form which would look like this:
df_missing_completed = pd.DataFrame({'group': ['A']*4+['B']*4,
'time': [1,2,3,4,1,2,3,4],
'value': [100, 101, 105,111,200, 202, 234,222]})
I had the idea that I could create an new dataframe as a template with all the dates and time points, without any values, join it with the real data which would induce NA's, and do a ffillon the value column to fill in the missing data, like below:
df_template = pd.DataFrame({'group': ['A']*4+['B']*4,
'time': [1,2,3,4,1,2,3,4]})
df_final = pd.merge(df_template, df_, on = ['group', 'time'], how='left')
df_final['filled_values'] = df_final['value'].fillna(method='ffill')
but this seems like a messy solution, and with the real data the df_templete will be more complex to create. Does anyone know a better one? Thanks!
reindexwithmethod='ffill'df_missing_completeddoesn't correpsond todf_final, as mozway pointed out in the first comment