4

I can't understand why I'm getting KeyError: Timestamp('...') when using loc on date index.

With given df: dtypes are datetime64[ns], int, int, DATE1 is index

            DATE1    VALUE2  VALUE3
2021-08-20 00:00:00      11     424
2021-08-21 00:00:00      22     424
2021-08-22 00:00:00      33     424
2021-08-23 00:00:00      44     242

I'm trying to use loc on index like this:

start_date = date(2021-08-20)
end_date = date(2021-08-23)
df = df.loc[start_date:end_date]

and this is working fine. I'm getting 4 records. However when I do this:

start_date = date(2021-08-20)
end_date = date(2021-08-24) #end_date is higher than values in dataframe
df = df.loc[start_date:end_date]

I'm getting KeyError: KeyError: Timestamp('2021-08-24 00:00:00'). Could someone point me how to resolve this?

3
  • I tried it and is not reproducible. Can you use pd.to_datetime to convert start and end date strings to datetime then use .loc Commented Aug 30, 2021 at 11:40
  • what is date(2021-08-20)? You can also try df.between(start_date, end_date) instead. Commented Aug 30, 2021 at 11:40
  • @HenryYik it's datetime.date(2021-08-20). For between I'm getting 'DatetimeIndex object has no attribute between'. Commented Aug 30, 2021 at 11:47

3 Answers 3

2

In order to use label-based slices with bounds outside of index range, the index must be monotonically increasing or decreasing.

From pandas docs:

If the index of a Series or DataFrame is monotonically increasing or decreasing, then the bounds of a label-based slice can be outside the range of the index, much like slice indexing a normal Python list. Monotonicity of an index can be tested with the is_monotonic_increasing() and is_monotonic_decreasing() attributes.

On the other hand, if the index is not monotonic, then both slice bounds must be unique members of the index.

You can use df.sort_index to sort the index and then out of bounds slices should work.

Sign up to request clarification or add additional context in comments.

Comments

0

Use a max to cap the last date in the data:

from datetime import datetime
x = """DATE1|VALUE2|VALUE3
2021-08-20 00:00:00    |  11    | 424
2021-08-21 00:00:00    |  22   |  424
2021-08-22 00:00:00    |  33   |  424
2021-08-23 00:00:00    |  44  |   242"""

def str_to_date(s):
    return datetime.strptime(s.strip(), "%Y-%m-%d %H:%M:%S")

df = pd.read_csv(StringIO(x), sep='|')
df['DATE1'] = df['DATE1'].apply(lambda s: str_to_date(s))
df = df.set_index('DATE1')

Then:

start_date = datetime.strptime('2021-08-20', "%Y-%m-%d")
end_date = min(datetime.strptime('2021-08-24', "%Y-%m-%d"), max(df.index))

df3 = df.loc[start_date:end_date]

[out]:

    VALUE2  VALUE3
DATE1       
2021-08-20  11  424
2021-08-21  22  424
2021-08-22  33  424
2021-08-23  44  242

The line min(datetime.strptime('2021-08-24', "%Y-%m-%d"), max(df.index) does:

  • max_date = max(df.index) stores the latest date
  • min(end_date, max_date) takes the nearest/lower date

3 Comments

This feels like a hack. Why can't I just put date which is beyond max(df.index) in loc?
Because loc isn't smart enough to do a max on the column (unfortunately). Loc is looking for the index and it's a simple dictionary where the key isn't found when you try to set a date higher than anything between the range.
Similar issues reported in stackoverflow.com/a/38341066/610569
0

Which pandas version are you using? It works fine for me:

start_date = pd.to_datetime('2021-08-20')
end_date = pd.to_datetime('2121-08-24')
df.loc[start_date:end_date]

output:

            VALUE2  VALUE3
DATE1                     
2021-08-20      11     424
2021-08-21      22     424
2021-08-22      33     424
2021-08-23      44     242

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.