Imagine I have a dataframe that looks like:
ID DATE VALUE_1 Value_2 ...
1 31-01-2006 5 "USD"
1 31-01-2007 5 "USD"
1 31-01-2008 10 "USD"
1 31-01-2011 11 "USD"
2 31-12-2006 5 "USD"
2 31-12-2007 5 "USD"
2 31-12-2008 5 "USD"
2 31-12-2009 5 "USD"
With X more columns.
As you can see this is panel data with multiple entries on the same date for different IDs. What I want to do is fill in missing dates for each ID. You can see that for ID "1" there is a jump in months between the second and third entry.
I would like a dataframe that looks like the one below - keep in mind that I am looking for a solution that works for dataframes with many value columns +30 and many ID's (1000+), and still is efficient. I.e there should NOT be any data filling for ID's that are already "complete", meaning, that they already have a frequency as specified by the data. In this case, a yearly frequency. Keep in mind though, that even though they have a yearly frequency, they don't always follow the calendar year.
ID DATE VALUE_1 Value_2 ...
1 31-01-2006 5 "USD"
1 31-01-2007 5 "USD"
1 31-01-2008 10 "USD"
1 31-01-2009 NA NA
1 31-01-2010 NA NA
1 31-01-2011 11 "USD"
2 31-12-2006 5 "USD"
2 31-12-2007 5 "USD"
2 31-12-2008 5 "USD"
2 31-12-2009 5 "USD"