5

I have a datetime-indexed dataframe object with 100,000+ rows. I was wondering if there was a convenient way using pandas to get a subset of this dataframe that is within multiple date ranges.

For example, let us say that we have two date ranges:

(datetime.datetime(2016,6,27,0,0,0), datetime.datetime(2016,6,27,5,0,0)

and

(datetime.datetime(2016,6,27,15,0,0), datetime.datetime(2016,6,27,23,59,59)

Let us say we want to get all rows of a dataframe object that is in either the first date range or the second date range, where the dataframe object has rows for every second from 2016-06-27 00:00:00 to 2016-06-27 23:59:59. Is there an easy way in pandas to do this?

2 Answers 2

8

There are two main ways to slice a DataFrame with a DatetimeIndex by date.

  • by slices: df.loc[start:end]. If there are multiple date ranges, the single slices may be concatenated with pd.concat.

  • by boolean selection mask: df.loc[mask]


Using pd.concat and slices:

import numpy as np
import pandas as pd
np.random.seed(2016)

N = 10**2
df = pd.DataFrame(np.random.randint(10, size=(N, 2)), 
                  index=pd.date_range('2016-6-27', periods=N, freq='45T'))

result = pd.concat([df.loc['2016-6-27':'2016-6-27 5:00'],
                    df.loc['2016-6-27 15:00':'2016-6-27 23:59:59']])

yields

                     0  1
2016-06-27 00:00:00  0  2
2016-06-27 00:45:00  5  5
2016-06-27 01:30:00  9  6
2016-06-27 02:15:00  8  4
2016-06-27 03:00:00  5  0
2016-06-27 03:45:00  4  8
2016-06-27 04:30:00  7  0
2016-06-27 15:00:00  2  5
2016-06-27 15:45:00  6  7
2016-06-27 16:30:00  6  8
2016-06-27 17:15:00  5  1
2016-06-27 18:00:00  2  9
2016-06-27 18:45:00  9  1
2016-06-27 19:30:00  9  7
2016-06-27 20:15:00  3  6
2016-06-27 21:00:00  3  5
2016-06-27 21:45:00  0  8
2016-06-27 22:30:00  5  6
2016-06-27 23:15:00  0  8

Note that unlike most slicing syntaxes used in Python,

df.loc['2016-6-27':'2016-6-27 5:00']

is inclusive on both ends -- the slice defines a closed interval, is not a half-open interval.


Using a boolean selection mask:

mask = (((df.index >= '2016-6-27') & (df.index <= '2016-6-27 5:00')) 
        | ((df.index >= '2016-6-27 15:00') & (df.index < '2016-6-28')))
result2 = df.loc[mask]
assert result.equals(result2)
Sign up to request clarification or add additional context in comments.

2 Comments

Hi @unutbu could you please consider this question?stackoverflow.com/questions/58532543/… Thanks!
What if we had multiple date ranges and not just two?
7

I feel the best option will be to use the direct checks rather than using loc function:

df = df[((df.index >= '2016-6-27') & (df.index <= '2016-6-27 5:00')) 
    | ((df.index >= '2016-6-27 15:00') & (df.index < '2016-6-28'))]

It works for me.

Major issue with loc function with a slice is that the limits should be present in the actual values, if not this will result in KeyError.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.