Add missing days in a dataframe

Question

I need to fill missing days in the column 'day':

    id  month   day trans
0   0     8     1   9
1   0     8     2   5
2   0     8     3   10
3   0     8     4   6
4   0     8     6   4
5   0     8     8   4

I am looking for output:

    id  month   day trans
0   0     8     1   9
1   0     8     2   5
2   0     8     3   10
3   0     8     4   6
4   0     8     5   NAN
5   0     8     6   4
6   0     8     7   NAN
7   0     8     8   4

In this case, I'm just dealing with august September and October — Youssef Razak
– Youssef Razak, Commented Dec 30, 2020 at 9:55
Would all groups have same no of days or they can vary? if they vary what happens when there are say 4 days in the next group? can you modify the example a bit? — anky
– anky, Commented Dec 30, 2020 at 10:03

wwnde · Accepted Answer · 2020-12-30 10:16:31Z

1

Use reindex()

df1=df.set_index('day').reindex([1,2,3,4,5,6,7]).reset_index()
df1[['month','id']]=df1[['month','id']].ffill()

Following your comment;

  mux = pd.MultiIndex.from_product([df['id'].unique(),[1,2,3,4,5,6,7]], names=['id','day'])
df1=df.set_index(['id','day']).reindex(mux).reset_index()   
df1[['month','id']]=df1[['month','id']].ffill()



id  day  month  #trans
0   0    1    8.0     9.0
1   0    2    8.0     5.0
2   0    3    8.0    10.0
3   0    4    8.0     6.0
4   0    5    8.0     NaN
5   0    6    8.0     4.0
6   0    7    8.0     NaN

edited Dec 30, 2020 at 10:16

answered Dec 30, 2020 at 9:44

wwnde

26.7k6 gold badges22 silver badges38 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Youssef Razak Over a year ago

i get the following error : ValueError: cannot reindex from a duplicate axis

wwnde Over a year ago

You probably have duplicate days in the day column. if so, the solution is to use atleast two columns that cast distinct categories. In that case lets add id; mux = pd.MultiIndex.from_product([df['id'].unique(),[1,2,3,4,5,6,7]], names=['id','day']) and df.set_index(['id','day']).reindex(mux)

wwnde Over a year ago

Does that help, willing to help further

wwnde Over a year ago

Thanks, up vote for the answer so that people can use your question and my answer in future with confidence. Up vote for you

Ludovic H · Accepted Answer · 2020-12-30 09:39:22Z

0

I think the best way to deal with it is building a pandas df that has all the [month, day] values of your output, and left merging your first df on [id, month, day] key.

answered Dec 30, 2020 at 9:39

Ludovic H

563 bronze badges

Comments

Ferris · Accepted Answer · 2020-12-30 11:19:53Z

0

Using pandas upsampling.

df['date'] = df.apply(lambda x: datetime(2020, x['month'], x['day']), axis=1)
df = df.set_index('date')
# Upsampling
df_daily = df.resample('D').asfreq().reset_index()

# reassign month and day
df_daily['month'] = df_daily.date.dt.month
df_daily['day'] = df_daily.date.dt.day
df_daily['id'] = df_daily['id'].fillna(method='ffill').astype(int)
del df_daily['date']

edited Dec 30, 2020 at 11:19

answered Dec 30, 2020 at 11:06

Ferris

5,6611 gold badge18 silver badges27 bronze badges

Collectives™ on Stack Overflow

Add missing days in a dataframe

3 Answers 3

4 Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

4 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related