60

I am able to read and slice pandas dataframe using python datetime objects, however I am forced to use only existing dates in index. For example, this works:

>>> data
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 252 entries, 2010-12-31 00:00:00 to 2010-04-01 00:00:00
Data columns:
Adj Close    252  non-null values
dtypes: float64(1)

>>> st = datetime.datetime(2010, 12, 31, 0, 0)
>>> en = datetime.datetime(2010, 12, 28, 0, 0)

>>> data[st:en]
            Adj Close
Date                 
2010-12-31     593.97
2010-12-30     598.86
2010-12-29     601.00
2010-12-28     598.92

However if I use a start or end date that is not present in the DF, I get python KeyError.

My Question : How do I query the dataframe object for a date range; even when the start and end dates are not present in the DataFrame. Does pandas allow for range based slicing?

I am using pandas version 0.10.1

1
  • Right now, on Pandas 0.20.3, I'm actually not getting any KeyErrors, even when using datetimes outside the frame's index... So the simplest solution is currently to just do like above in the OP! df[dt.datetime(1914, 1, 1):] gives me dates from 2010. Commented Apr 29, 2019 at 12:20

6 Answers 6

58

Use searchsorted to find the nearest times first, and then use it to slice.

In [15]: df = pd.DataFrame([1, 2, 3], index=[dt.datetime(2013, 1, 1), dt.datetime(2013, 1, 3), dt.datetime(2013, 1, 5)])

In [16]: df
Out[16]: 
            0
2013-01-01  1
2013-01-03  2
2013-01-05  3

In [22]: start = df.index.searchsorted(dt.datetime(2013, 1, 2))

In [23]: end = df.index.searchsorted(dt.datetime(2013, 1, 4))

In [24]: df.iloc[start:end]
Out[24]: 
            0
2013-01-03  2
Sign up to request clarification or add additional context in comments.

4 Comments

If I copy paste your example, it works fine. But the start and end variables in my program, always default to the length of the dataframe! what am I doing wrong? - pastebin.com/raw.php?i=hfpHqF7s
Seems you should sort your DataFrame in ascending order.
Thanks, it worked when the data was in sorted in ascending order.
Note that searchsorted is no longer defined on DataFrame or Series, see this question.
36

Short answer: Sort your data (data.sort()) and then I think everything will work the way you are expecting.

Yes, you can slice using datetimes not present in the DataFrame. For example:

In [12]: df
Out[12]: 
                   0
2013-04-20  1.120024
2013-04-21 -0.721101
2013-04-22  0.379392
2013-04-23  0.924535
2013-04-24  0.531902
2013-04-25 -0.957936

In [13]: df['20130419':'20130422']
Out[13]: 
                   0
2013-04-20  1.120024
2013-04-21 -0.721101
2013-04-22  0.379392

As you can see, you don't even have to build datetime objects; strings work.

Because the datetimes in your index are not sequential, the behavior is weird. If we shuffle the index of my example here...

In [17]: df
Out[17]: 
                   0
2013-04-22  1.120024
2013-04-20 -0.721101
2013-04-24  0.379392
2013-04-23  0.924535
2013-04-21  0.531902
2013-04-25 -0.957936

...and take the same slice, we get a different result. It returns the first element inside the range and stops at the first element outside the range.

In [18]: df['20130419':'20130422']
Out[18]: 
                   0
2013-04-22  1.120024
2013-04-20 -0.721101
2013-04-24  0.379392

This is probably not useful behavior. If you want to select ranges of dates, would it make sense to sort it by date first?

df.sort_index()

5 Comments

When I try to do this, I get an python exception: TimeSeriesError: Partial indexing only valid for ordered time series.
The exception was self explanatory - I had missed sorting the data, :( - Thanks, text based slicing as you've shown above works as expected. But I used the searchsorted function since the dates in program were already datetime objects.
df['20130419':'20130422'] is exceptional! Even works with sparse data (e.g. specifying a date that doesn't exist in the index). Thank you!
Please note that data.sort() is now deprecated. The replacement for this application would be data.sort_index() - http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.sort_index.html
Thanks, Kapocsi. Updated.
24

You can use a simple mask to accomplish this:

date_mask = (data.index > start) & (data.index < end)
dates = data.index[date_mask]
data.ix[dates]

By the way, this works for hierarchical indexing as well. In that case data.index would be replaced with data.index.levels[0] or similar.

1 Comment

This answer needs more upvotes. I've been looking for this for weeks!
1

I had difficulty with other approaches but I found that the following approach worked for me:

# Set the Index to be the Date
df['Date'] = pd.to_datetime(df['Date_1'], format='%d/%m/%Y')
df.set_index('Date', inplace=True)

# Sort the Data
df = df.sort_values('Date_1')

# Slice the Data
From = '2017-05-07'
To   = '2017-06-07'
df_Z = df.loc[From:To,:]

1 Comment

You do not need more than instr.loc[From:To] Let me guess, you are also a user of R?
1
target_date = pd.Timestamp('2020-05-10')

df[df['DATE'].dt.date == (dt.date(target_date.year, target_date.month, target_date.day))]

This line will let you select all the dates that matches the condition, in case you don't neet to set "Date Column" as index.

1 Comment

Looks good. Put just to point out that the dt in df[df['DATE'].dt.date is not the same dt in dt.date, and the latter has to be exposed by import datetime as dt to avoid getting an error
1

Given a dataframe df where the date to query on is in the index, you can use query:

In [5]: df = pd.DataFrame({'Close': {'2010-12-31': 593.97, '2010-12-30': 598.86, '2010-12-29': 601.0, '2010-12-28': 598.92}})

In [6]: df
Out[6]: 
             Close
Date              
2010-12-31  593.97
2010-12-30  598.86
2010-12-29  601.00
2010-12-28  598.92

In [7]: df.query('"2010-12-01" < index <= "2010-12-29"')
Out[7]: 
             Close
Date              
2010-12-29  601.00
2010-12-28  598.92

The start and/or end date don't need to be present in the index of df.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.