I have a dataframe in which I am trying to fill in missing months, keeping the value of previous months.
| Score_Date | Num | Name | Score |
|:----------:|:----:|:------------------:|:-----:|
| 2019-12-01 | 4544 | ABC ELECTRONICS CO | 50 |
| 2020-03-01 | 4544 | ABC ELECTRONICS CO | 75 |
| 2020-06-01 | 4544 | ABC ELECTRONICS CO | 90 |
| 2020-09-01 | 454 | ABC ELECTRONICS CO | 50 |
Ideally, the dataframe would look like:
| Score_Date | Num | Name | Score |
|:----------:|:----:|:------------------:|:-----:|
| 2019-12-01 | 4544 | ABC ELECTRONICS CO | 50 |
| 2020-01-01 | 4544 | ABC ELECTRONICS CO | 50 |
| 2020-02-01 | 4544 | ABC ELECTRONICS CO | 50 |
| 2020-03-01 | 4544 | ABC ELECTRONICS CO | 75 |
| 2020-04-01 | 4544 | ABC ELECTRONICS CO | 75 |
| 2020-05-01 | 4544 | ABC ELECTRONICS CO | 75 |
| 2020-06-01 | 4544 | ABC ELECTRONICS CO | 90 |
| 2020-07-01 | 4544 | ABC ELECTRONICS CO | 90 |
| 2020-08-01 | 4544 | ABC ELECTRONICS CO | 90 |
| 2020-09-01 | 4544 | ABC ELECTRONICS CO | 50 |
Where I am filling in missing month values with the value of the month before, using pandas ffill()
I found this post and tried to implement a solution:
def expand_dates(grp):
start = grp.index.min()
end = today
index = pd.date_range(start, end, freq='M')
return grp.reindex(index).ffill()
test_df = test_df.set_index('Score_Date')
test_df = test_df.groupby('Name')['Score'].apply(expand_dates)
print(pd.concat([test_df.head(), test_df.tail()]))
Yet I receive:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-26-d27510150dc0> in <module>
4 index = pd.date_range(start, end, freq='M')
5 return grp.reindex(index).ffill()
----> 6 test_df = test_df.set_index('Score_Date')
7 test_df = test_df.groupby('Name')['Score'].apply(expand_dates)
8 print(pd.concat([test_df.head(), test_df.tail()]))
c:\python367-64\lib\site-packages\pandas\core\frame.py in set_index(self, keys, drop, append, inplace, verify_integrity)
4553
4554 if missing:
-> 4555 raise KeyError(f"None of {missing} are in the columns")
4556
4557 if inplace:
KeyError: "None of ['Score_Date'] are in the columns"
Note, print(test_df.columns) reveals: Index(['Num', 'Name', 'Score'], dtype='object'), yet if I print(test_df), the column shows up.
CSV Data:
Score_Date,Num,Name,Score
2019-12-01,4544,ABC ELECTRONICS CO,50
2020-03-01,4544,ABC ELECTRONICS CO,75
2020-06-01,4544,ABC ELECTRONICS CO,90
2020-09-01,4544,ABC ELECTRONICS CO,50
print(test_df.columns)encodingargument inpd.read_csvpd.read_csv, nothing fancy there. @Ralubrusto