Pandas: enter missing rows in a dataframe

Question

I'm collecting time series data, but sometimes for some time points there is no data to be collected. Just say for example I am collecting data across four time points, I might get a dataframe like this:

df_ = pd.DataFrame({'group': ['A']*3+['B']*3,
                    'time': [1,2,4,1,3,4],
                    'value': [100,105,111,200,234,222]})

sometimes there is a datapoint missing and so there is no row for that point, I would like groupby and to forward fill with the previous value to create a new row form which would look like this:

df_missing_completed = pd.DataFrame({'group': ['A']*4+['B']*4,
                                     'time': [1,2,3,4,1,2,3,4],
                                     'value': [100, 101, 105,111,200, 202, 234,222]})

I had the idea that I could create an new dataframe as a template with all the dates and time points, without any values, join it with the real data which would induce NA's, and do a ffillon the value column to fill in the missing data, like below:

df_template = pd.DataFrame({'group': ['A']*4+['B']*4,
                                 'time': [1,2,3,4,1,2,3,4]})
df_final = pd.merge(df_template, df_, on = ['group', 'time'], how='left')
df_final['filled_values'] = df_final['value'].fillna(method='ffill')

but this seems like a messy solution, and with the real data the df_templete will be more complex to create. Does anyone know a better one? Thanks!

You don't really do a ffill here, can you clarify the logic? — mozway
– mozway, Commented Dec 1, 2022 at 14:05
you may want to have a look at reindex with method='ffill' — Stef
– Stef, Commented Dec 1, 2022 at 14:05
Thanks @mozway. I edited my question to clarify how it would work with a template dataframe and ffill. But I don't find it to be a satisfying solution. — pd441
– pd441, Commented Dec 1, 2022 at 14:14
your df_missing_completed doesn't correpsond to df_final, as mozway pointed out in the first comment — Stef
– Stef, Commented Dec 1, 2022 at 15:17

mozway · Accepted Answer · 2022-12-01 21:20:44Z

2

I would use:

(df_.pivot(index='time', columns='group', values='value')
    # reindex only of you want to add missing times for all groups
    .reindex(range(df_['time'].min(), df_['time'].max()+1))
    .ffill().unstack().reset_index(name='value')
)

Output:

  group  time  value
0     A     1  100.0
1     A     2  105.0
2     A     3  105.0
3     A     4  111.0
4     B     1  200.0
5     B     2  200.0
6     B     3  234.0
7     B     4  222.0

edited Dec 1, 2022 at 21:20

answered Dec 1, 2022 at 14:17

mozway

267k13 gold badges56 silver badges106 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Stef Over a year ago

this is fine for the given example but won't work in general, e.g. if the same time is missing for all groups, say for 'group': ['A']*3+['B']*3, 'time': [1,2,4,1,2,4],

mozway Over a year ago

@Stef it's not sure that OP wants to fill those, but if this is the case, your can reindex before the ffill

pd441 Over a year ago

@mozway, thanks, it seems to work, however, with my large dataset, there may be cases where the same time is missing for all groups. Would it not work then?

mozway Over a year ago

@stevezissou if you want to add all numbers you need to add a reindex step, is this what you want? See the updated answer.

Stef · Accepted Answer · 2022-12-01 15:05:20Z

1

Instead of a template dataframe you could create a new index and then reindex with ffill:

new_idx = pd.MultiIndex.from_product([list('AB'), range(1,5)], names=['group', 'time'])
df_.set_index(['group', 'time']).reindex(new_idx, method='ffill').reset_index()

The result keeps the datatype of the value column:

  group  time  value
0     A     1    100
1     A     2    105
2     A     3    105
3     A     4    111
4     B     1    200
5     B     2    200
6     B     3    234
7     B     4    222

answered Dec 1, 2022 at 15:05

Stef

30.9k3 gold badges34 silver badges60 bronze badges

Collectives™ on Stack Overflow

Pandas: enter missing rows in a dataframe

2 Answers 2

4 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related