Select rows between two DatetimeIndex dates

Question

I have a CSV file of the following format:

vm,time,LoadInt1
abc-webapp-02,2017-05-31 10:00:00,3.133333
abc-webapp-02,2017-05-31 10:05:00,0.000000
abc-webapp-02,2017-05-31 10:10:00,0.000000
abc-webapp-02,2017-05-31 10:15:00,0.000000
abc-webapp-02,2017-05-31 10:20:00,0.000000
abc-webapp-02,2017-05-31 10:25:00,0.000000
abc-webapp-02,2017-05-31 10:30:00,0.000000
abc-webapp-02,2017-05-31 10:35:00,0.000000
abc-webapp-02,2017-05-31 10:40:00,0.000000

I read the CSV file into a DataFrame using the following code. The date is parsed as index (DatetimeIndex)

dateparse = lambda x: pd.datetime.strptime(x, '%Y-%m-%d %H:%M:%S')
df = pd.read_csv("my_file.csv", header=0, parse_dates=[1], index_col=1, date_parser=dateparse)

Now I am trying to get all the rows between two dates using the following code (The real CSV file has large number of rows between the dates mentioned below):

df.loc['2017-05-30' : '2017-05-31']

Please note, above approach is suggested here. But, it's not working for me. So, it may not be a duplicate question.

Can you show the output of df.head()?

user2285236
– user2285236

2017-07-13 15:21:34 +00:00
Commented Jul 13, 2017 at 15:21 — user2285236
– user2285236, Commented Jul 13, 2017 at 15:21

MaxU - stand with Ukraine · Accepted Answer · 2017-07-13 15:30:40Z

3

Using query method:

df = pd.read_csv("my_file.csv", index_col=1, parse_dates=True)

In [121]: df.query("'2017-05-30' <= index <= '2017-06-01'")
Out[121]:
                                vm  LoadInt1
time
2017-05-31 10:00:00  abc-webapp-02  3.133333
2017-05-31 10:05:00  abc-webapp-02  0.000000
2017-05-31 10:10:00  abc-webapp-02  0.000000
2017-05-31 10:15:00  abc-webapp-02  0.000000
2017-05-31 10:20:00  abc-webapp-02  0.000000
2017-05-31 10:25:00  abc-webapp-02  0.000000
2017-05-31 10:30:00  abc-webapp-02  0.000000
2017-05-31 10:35:00  abc-webapp-02  0.000000
2017-05-31 10:40:00  abc-webapp-02  0.000000

answered Jul 13, 2017 at 15:30

MaxU - stand with Ukraine

212k37 gold badges402 silver badges436 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Arnab Biswas Over a year ago

Thank you! This works. I was wondering why loc is not working.

piRSquared · Accepted Answer · 2017-07-13 15:29:48Z

2

This type of index slicing includes the end points and so what you have will include the entire sample set

df.loc['2017-05-30':'2017-05-31']
#df['2017-05-30':'2017-05-31']

                                vm  LoadInt1
time                                        
2017-05-31 10:00:00  abc-webapp-02  3.133333
2017-05-31 10:05:00  abc-webapp-02  0.000000
2017-05-31 10:10:00  abc-webapp-02  0.000000
2017-05-31 10:15:00  abc-webapp-02  0.000000
2017-05-31 10:20:00  abc-webapp-02  0.000000
2017-05-31 10:25:00  abc-webapp-02  0.000000
2017-05-31 10:30:00  abc-webapp-02  0.000000
2017-05-31 10:35:00  abc-webapp-02  0.000000
2017-05-31 10:40:00  abc-webapp-02  0.000000

This shows the same thing but actually subsets

df.loc['2017-05-31 10:10':'2017-05-31 10:35']

                                vm  LoadInt1
time                                        
2017-05-31 10:10:00  abc-webapp-02       0.0
2017-05-31 10:15:00  abc-webapp-02       0.0
2017-05-31 10:20:00  abc-webapp-02       0.0
2017-05-31 10:25:00  abc-webapp-02       0.0
2017-05-31 10:30:00  abc-webapp-02       0.0
2017-05-31 10:35:00  abc-webapp-02       0.0

Your import could be made smaller. You don't need the parser

df = pd.read_csv("my_file.csv", parse_dates=[1], index_col=1)

answered Jul 13, 2017 at 15:29

piRSquared

296k68 gold badges509 silver badges654 bronze badges

5 Comments

Arnab Biswas Over a year ago

The problem is df.loc['2017-05-30':'2017-05-31'] is not working for me. That is surprising me. It's returning an empty series. That is true when I don't have the parser as well. Any thoughts why?

piRSquared Over a year ago

Are you still using the same import you used. I used the one I mentioned and the index is a DatetimeIndex. Check to see yours is too. type(df.index) If it isn't then that means your conversion to datetime didn't work and. Try mine instead, which I posted in my third bullet. What version of pandas are you using? pd.__version__

Arnab Biswas Over a year ago

The index is : pandas.tseries.index.DatetimeIndex. And the pandas version is 0.18.1. Even after removing 'dateparse' thing, I am getting the same empty result. :-(

piRSquared Over a year ago

@ArnabBiswas take off the loc and just do df['2017-05-30':'2017-05-31']

Arnab Biswas Over a year ago

Noop. It's not working. One catch is in my CSV, the time indices are not equally spaced. For example, for the latest date, every 5 minutes, there is one data point. For one day older data, every half an hour there is a data point. The older is the data, more sparse it is. Could that be a reason?

Collectives™ on Stack Overflow

Select rows between two DatetimeIndex dates

2 Answers 2

1 Comment

5 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

5 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related