Pandas date index loc between dates throws KeyError when edge date is not in dataframe

Question

I can't understand why I'm getting KeyError: Timestamp('...') when using loc on date index.

With given df: dtypes are datetime64[ns], int, int, DATE1 is index

            DATE1    VALUE2  VALUE3
2021-08-20 00:00:00      11     424
2021-08-21 00:00:00      22     424
2021-08-22 00:00:00      33     424
2021-08-23 00:00:00      44     242

I'm trying to use loc on index like this:

start_date = date(2021-08-20)
end_date = date(2021-08-23)
df = df.loc[start_date:end_date]

and this is working fine. I'm getting 4 records. However when I do this:

start_date = date(2021-08-20)
end_date = date(2021-08-24) #end_date is higher than values in dataframe
df = df.loc[start_date:end_date]

I'm getting KeyError: KeyError: Timestamp('2021-08-24 00:00:00'). Could someone point me how to resolve this?

I tried it and is not reproducible. Can you use pd.to_datetime to convert start and end date strings to datetime then use .loc — ThePyGuy
– ThePyGuy, Commented Aug 30, 2021 at 11:40
what is date(2021-08-20)? You can also try df.between(start_date, end_date) instead. — Henry Yik
– Henry Yik, Commented Aug 30, 2021 at 11:40
@HenryYik it's datetime.date(2021-08-20). For between I'm getting 'DatetimeIndex object has no attribute between'. — Kalik
– Kalik, Commented Aug 30, 2021 at 11:47

nikniknik · Accepted Answer · 2022-10-03 11:17:29Z

2

In order to use label-based slices with bounds outside of index range, the index must be monotonically increasing or decreasing.

From pandas docs:

If the index of a Series or DataFrame is monotonically increasing or decreasing, then the bounds of a label-based slice can be outside the range of the index, much like slice indexing a normal Python list. Monotonicity of an index can be tested with the is_monotonic_increasing() and is_monotonic_decreasing() attributes.

On the other hand, if the index is not monotonic, then both slice bounds must be unique members of the index.

You can use df.sort_index to sort the index and then out of bounds slices should work.

answered Oct 3, 2022 at 11:17

nikniknik

1912 silver badges11 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

alvas · Accepted Answer · 2021-08-30 11:46:50Z

0

Use a max to cap the last date in the data:

from datetime import datetime
x = """DATE1|VALUE2|VALUE3
2021-08-20 00:00:00    |  11    | 424
2021-08-21 00:00:00    |  22   |  424
2021-08-22 00:00:00    |  33   |  424
2021-08-23 00:00:00    |  44  |   242"""

def str_to_date(s):
    return datetime.strptime(s.strip(), "%Y-%m-%d %H:%M:%S")

df = pd.read_csv(StringIO(x), sep='|')
df['DATE1'] = df['DATE1'].apply(lambda s: str_to_date(s))
df = df.set_index('DATE1')

Then:

start_date = datetime.strptime('2021-08-20', "%Y-%m-%d")
end_date = min(datetime.strptime('2021-08-24', "%Y-%m-%d"), max(df.index))

df3 = df.loc[start_date:end_date]

[out]:

    VALUE2  VALUE3
DATE1       
2021-08-20  11  424
2021-08-21  22  424
2021-08-22  33  424
2021-08-23  44  242

The line min(datetime.strptime('2021-08-24', "%Y-%m-%d"), max(df.index) does:

max_date = max(df.index) stores the latest date
min(end_date, max_date) takes the nearest/lower date

answered Aug 30, 2021 at 11:46

alvas

123k118 gold badges503 silver badges807 bronze badges

3 Comments

Kalik Over a year ago

This feels like a hack. Why can't I just put date which is beyond max(df.index) in loc?

alvas Over a year ago

Because loc isn't smart enough to do a max on the column (unfortunately). Loc is looking for the index and it's a simple dictionary where the key isn't found when you try to set a date higher than anything between the range.

alvas Over a year ago

Similar issues reported in stackoverflow.com/a/38341066/610569

mozway · Accepted Answer · 2021-08-30 12:04:06Z

0

Which pandas version are you using? It works fine for me:

start_date = pd.to_datetime('2021-08-20')
end_date = pd.to_datetime('2121-08-24')
df.loc[start_date:end_date]

output:

            VALUE2  VALUE3
DATE1                     
2021-08-20      11     424
2021-08-21      22     424
2021-08-22      33     424
2021-08-23      44     242

answered Aug 30, 2021 at 12:04

mozway

267k13 gold badges56 silver badges106 bronze badges

Collectives™ on Stack Overflow

Pandas date index loc between dates throws KeyError when edge date is not in dataframe

3 Answers 3

Comments

3 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related