Fill in missing dates in a groupby with a defined frequency with multiple columns

Question

Imagine I have a dataframe that looks like:

ID      DATE         VALUE_1   Value_2 ...
1    31-01-2006        5         "USD"  
1    31-01-2007        5         "USD"
1    31-01-2008        10        "USD"
1    31-01-2011        11        "USD"
2    31-12-2006        5         "USD"
2    31-12-2007        5         "USD"
2    31-12-2008        5         "USD"
2    31-12-2009        5         "USD"

With X more columns.

As you can see this is panel data with multiple entries on the same date for different IDs. What I want to do is fill in missing dates for each ID. You can see that for ID "1" there is a jump in months between the second and third entry.

I would like a dataframe that looks like the one below - keep in mind that I am looking for a solution that works for dataframes with many value columns +30 and many ID's (1000+), and still is efficient. I.e there should NOT be any data filling for ID's that are already "complete", meaning, that they already have a frequency as specified by the data. In this case, a yearly frequency. Keep in mind though, that even though they have a yearly frequency, they don't always follow the calendar year.

ID      DATE         VALUE_1   Value_2 ...
1    31-01-2006        5         "USD"  
1    31-01-2007        5         "USD"
1    31-01-2008        10        "USD"
1    31-01-2009        NA          NA
1    31-01-2010        NA          NA
1    31-01-2011        11        "USD"
2    31-12-2006        5         "USD"
2    31-12-2007        5         "USD"
2    31-12-2008        5         "USD"
2    31-12-2009        5         "USD"

Is the date range fixed or is it different for each id? What if the first two rows were missing as well in your example? Would there anything to be filled in that case? — a_guest
– a_guest, Commented Feb 2, 2019 at 0:57
@a_guest The date range is different for each id. There are never missing rows for the first dates for a specific ID. The start date may vary for each ID, but those dates previous to the start date for a specific ID will not be in the table initially. — Chris
– Chris, Commented Feb 2, 2019 at 1:53

KRKirov · Accepted Answer · 2019-02-02 21:14:50Z

1

Here is a fully flexible solution:

def resample_custom_freq(data):
    """ Resample datetime using different time offsets """

    # Compute the offsets
    month = data['Month'][0] - 1
    day = data['Day'][0] - 1

    # Modify data
    data = data.resample('AS').last().drop('ID', axis=1).reset_index().reset_index()
    data.loc[:, 'DATE'] += pd.offsets.MonthOffset(month)
    data.loc[:, 'DATE'] += pd.offsets.DateOffset(day)
    return data

df['DATE'] =  pd.to_datetime(df['DATE'])
df['Month'] = df['DATE'].dt.month
df['Day'] = df['DATE'].dt.day
df.set_index('DATE', inplace=True, drop=True)
df_1 = df.groupby('ID').apply(resample_custom_freq).reset_index().drop(['level_1', 'index', 'Month', 'Day'], axis=1)

df_1
Out[264]: 
   ID       DATE  VALUE_1 Value_2
0   1 2006-01-31      5.0   "USD"
1   1 2007-01-31      5.0   "USD"
2   1 2008-01-31     10.0   "USD"
3   1 2009-01-31      NaN     NaN
4   1 2010-01-31      NaN     NaN
5   1 2011-01-31     11.0   "USD"
6   2 2006-12-31      5.0   "USD"
7   2 2007-12-31      5.0   "USD"
8   2 2008-12-31      5.0   "USD"
9   2 2009-12-31      5.0    "USD

edited Feb 2, 2019 at 21:14

answered Feb 2, 2019 at 1:07

KRKirov

4,0142 gold badges21 silver badges24 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Chris Over a year ago

The solution seems really nice for monthly data. Check my update question with regards to yearly data.

Chris Over a year ago

The problem with your solution is that it redefines the data to be end of year. I don't want that. I want the yearly frequency but not for everything to be end of year.

KRKirov Over a year ago

Here is an updated answer that works if your year ends are in January and December.

Chris Over a year ago

Is there no way to make it dynamic i.e suitable for all possible year ends?

Collectives™ on Stack Overflow

Fill in missing dates in a groupby with a defined frequency with multiple columns

1 Answer 1

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related