Forward fill dates failing in pandas - Python 3.6

Question

I have a dataframe in which I am trying to fill in missing months, keeping the value of previous months.

| Score_Date |  Num |        Name        | Score |
|:----------:|:----:|:------------------:|:-----:|
| 2019-12-01 | 4544 | ABC ELECTRONICS CO |   50  |
| 2020-03-01 | 4544 | ABC ELECTRONICS CO |   75  |
| 2020-06-01 | 4544 | ABC ELECTRONICS CO |   90  |
| 2020-09-01 |  454 | ABC ELECTRONICS CO |   50  |

Ideally, the dataframe would look like:

| Score_Date |  Num |        Name        | Score |
|:----------:|:----:|:------------------:|:-----:|
| 2019-12-01 | 4544 | ABC ELECTRONICS CO |   50  |
| 2020-01-01 | 4544 | ABC ELECTRONICS CO |   50  |
| 2020-02-01 | 4544 | ABC ELECTRONICS CO |   50  |
| 2020-03-01 | 4544 | ABC ELECTRONICS CO |   75  |
| 2020-04-01 | 4544 | ABC ELECTRONICS CO |   75  |
| 2020-05-01 | 4544 | ABC ELECTRONICS CO |   75  |
| 2020-06-01 | 4544 | ABC ELECTRONICS CO |   90  |
| 2020-07-01 | 4544 | ABC ELECTRONICS CO |   90  |
| 2020-08-01 | 4544 | ABC ELECTRONICS CO |   90  |
| 2020-09-01 | 4544 | ABC ELECTRONICS CO |   50  |

Where I am filling in missing month values with the value of the month before, using pandas ffill()

I found this post and tried to implement a solution:

def expand_dates(grp):
    start = grp.index.min()
    end = today
    index = pd.date_range(start, end, freq='M')
    return grp.reindex(index).ffill()
test_df = test_df.set_index('Score_Date')
test_df = test_df.groupby('Name')['Score'].apply(expand_dates)
print(pd.concat([test_df.head(), test_df.tail()]))

Yet I receive:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-26-d27510150dc0> in <module>
      4     index = pd.date_range(start, end, freq='M')
      5     return grp.reindex(index).ffill()
----> 6 test_df = test_df.set_index('Score_Date')
      7 test_df = test_df.groupby('Name')['Score'].apply(expand_dates)
      8 print(pd.concat([test_df.head(), test_df.tail()]))

c:\python367-64\lib\site-packages\pandas\core\frame.py in set_index(self, keys, drop, append, inplace, verify_integrity)
   4553 
   4554         if missing:
-> 4555             raise KeyError(f"None of {missing} are in the columns")
   4556 
   4557         if inplace:

KeyError: "None of ['Score_Date'] are in the columns"

Note, print(test_df.columns) reveals: Index(['Num', 'Name', 'Score'], dtype='object'), yet if I print(test_df), the column shows up.

CSV Data:

Score_Date,Num,Name,Score
2019-12-01,4544,ABC ELECTRONICS CO,50
2020-03-01,4544,ABC ELECTRONICS CO,75
2020-06-01,4544,ABC ELECTRONICS CO,90
2020-09-01,4544,ABC ELECTRONICS CO,50

the error is quite clear 'Score_Date' doesn't exist in the dataframe, check for trailing and leading spaces print(test_df.columns) — Umar.H
– Umar.H, Commented Dec 15, 2020 at 17:53
May I ask you how do you load your dataframe? Because in this case, for example, the problem is with the encoding argument in pd.read_csv — Ralubrusto
– Ralubrusto, Commented Dec 15, 2020 at 18:11

Michael Szczesny · Accepted Answer · 2020-12-15 20:45:21Z

2

Assuming the column Score_Date is datetime, e.g. imported with

df = pd.read_csv('yourdata.csv', parse_dates=['Score_Date'])

You can df.reindex with method='ffill'

df.set_index('Score_Date', inplace=True)
df_test = (df.reindex(pd.date_range(df.index.min(), df.index.max(), freq='MS'), method='ffill')
             .rename_axis('Score_Date').reset_index())
print(df_test)

Out:

  Score_Date   Num                Name  Score
0 2019-12-01  4544  ABC ELECTRONICS CO     50
1 2020-01-01  4544  ABC ELECTRONICS CO     50
2 2020-02-01  4544  ABC ELECTRONICS CO     50
3 2020-03-01  4544  ABC ELECTRONICS CO     75
4 2020-04-01  4544  ABC ELECTRONICS CO     75
5 2020-05-01  4544  ABC ELECTRONICS CO     75
6 2020-06-01  4544  ABC ELECTRONICS CO     90
7 2020-07-01  4544  ABC ELECTRONICS CO     90
8 2020-08-01  4544  ABC ELECTRONICS CO     90
9 2020-09-01   454  ABC ELECTRONICS CO     50

edited Dec 15, 2020 at 20:45

answered Dec 15, 2020 at 18:11

Michael Szczesny

5,0465 gold badges20 silver badges36 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

artemis Over a year ago

How can I assign that reindex line to a new dataframe? Or do it in place? If I make the next line print(test_df), it shows the original.

Quang Hoang · Accepted Answer · 2020-12-15 18:15:18Z

Let's try reading the data and parse dates at once, then you can use asfreq:

df = pd.read_clipboard(sep=',',parse_dates=True, index_col=0); df

df.asfreq('MS').ffill()

Output:

               Num                Name  Score
Score_Date                                   
2019-12-01  4544.0  ABC ELECTRONICS CO   50.0
2020-01-01  4544.0  ABC ELECTRONICS CO   50.0
2020-02-01  4544.0  ABC ELECTRONICS CO   50.0
2020-03-01  4544.0  ABC ELECTRONICS CO   75.0
2020-04-01  4544.0  ABC ELECTRONICS CO   75.0
2020-05-01  4544.0  ABC ELECTRONICS CO   75.0
2020-06-01  4544.0  ABC ELECTRONICS CO   90.0
2020-07-01  4544.0  ABC ELECTRONICS CO   90.0
2020-08-01  4544.0  ABC ELECTRONICS CO   90.0
2020-09-01  4544.0  ABC ELECTRONICS CO   50.0

Collectives™ on Stack Overflow

Forward fill dates failing in pandas - Python 3.6

2 Answers 2

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related