2

I need to fill missing days in the column 'day':

    id  month   day trans
0   0     8     1   9
1   0     8     2   5
2   0     8     3   10
3   0     8     4   6
4   0     8     6   4
5   0     8     8   4

I am looking for output:

    id  month   day trans
0   0     8     1   9
1   0     8     2   5
2   0     8     3   10
3   0     8     4   6
4   0     8     5   NAN
5   0     8     6   4
6   0     8     7   NAN
7   0     8     8   4
3
  • how do you deal with the Feb month? Commented Dec 30, 2020 at 9:51
  • In this case, I'm just dealing with august September and October Commented Dec 30, 2020 at 9:55
  • Would all groups have same no of days or they can vary? if they vary what happens when there are say 4 days in the next group? can you modify the example a bit? Commented Dec 30, 2020 at 10:03

3 Answers 3

1

Use reindex()

df1=df.set_index('day').reindex([1,2,3,4,5,6,7]).reset_index()
df1[['month','id']]=df1[['month','id']].ffill()

Following your comment;

  mux = pd.MultiIndex.from_product([df['id'].unique(),[1,2,3,4,5,6,7]], names=['id','day'])
df1=df.set_index(['id','day']).reindex(mux).reset_index()   
df1[['month','id']]=df1[['month','id']].ffill()



id  day  month  #trans
0   0    1    8.0     9.0
1   0    2    8.0     5.0
2   0    3    8.0    10.0
3   0    4    8.0     6.0
4   0    5    8.0     NaN
5   0    6    8.0     4.0
6   0    7    8.0     NaN
Sign up to request clarification or add additional context in comments.

4 Comments

i get the following error : ValueError: cannot reindex from a duplicate axis
You probably have duplicate days in the day column. if so, the solution is to use atleast two columns that cast distinct categories. In that case lets add id; mux = pd.MultiIndex.from_product([df['id'].unique(),[1,2,3,4,5,6,7]], names=['id','day']) and df.set_index(['id','day']).reindex(mux)
Does that help, willing to help further
Thanks, up vote for the answer so that people can use your question and my answer in future with confidence. Up vote for you
0

I think the best way to deal with it is building a pandas df that has all the [month, day] values of your output, and left merging your first df on [id, month, day] key.

Comments

0

Using pandas upsampling.

df['date'] = df.apply(lambda x: datetime(2020, x['month'], x['day']), axis=1)
df = df.set_index('date')
# Upsampling
df_daily = df.resample('D').asfreq().reset_index()

# reassign month and day
df_daily['month'] = df_daily.date.dt.month
df_daily['day'] = df_daily.date.dt.day
df_daily['id'] = df_daily['id'].fillna(method='ffill').astype(int)
del df_daily['date']

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.