1

I have a CSV file of the following format:

vm,time,LoadInt1
abc-webapp-02,2017-05-31 10:00:00,3.133333
abc-webapp-02,2017-05-31 10:05:00,0.000000
abc-webapp-02,2017-05-31 10:10:00,0.000000
abc-webapp-02,2017-05-31 10:15:00,0.000000
abc-webapp-02,2017-05-31 10:20:00,0.000000
abc-webapp-02,2017-05-31 10:25:00,0.000000
abc-webapp-02,2017-05-31 10:30:00,0.000000
abc-webapp-02,2017-05-31 10:35:00,0.000000
abc-webapp-02,2017-05-31 10:40:00,0.000000

I read the CSV file into a DataFrame using the following code. The date is parsed as index (DatetimeIndex)

dateparse = lambda x: pd.datetime.strptime(x, '%Y-%m-%d %H:%M:%S')
df = pd.read_csv("my_file.csv", header=0, parse_dates=[1], index_col=1, date_parser=dateparse)

Now I am trying to get all the rows between two dates using the following code (The real CSV file has large number of rows between the dates mentioned below):

df.loc['2017-05-30' : '2017-05-31']

Please note, above approach is suggested here. But, it's not working for me. So, it may not be a duplicate question.

1
  • Can you show the output of df.head()? Commented Jul 13, 2017 at 15:21

2 Answers 2

3

Using query method:

df = pd.read_csv("my_file.csv", index_col=1, parse_dates=True)

In [121]: df.query("'2017-05-30' <= index <= '2017-06-01'")
Out[121]:
                                vm  LoadInt1
time
2017-05-31 10:00:00  abc-webapp-02  3.133333
2017-05-31 10:05:00  abc-webapp-02  0.000000
2017-05-31 10:10:00  abc-webapp-02  0.000000
2017-05-31 10:15:00  abc-webapp-02  0.000000
2017-05-31 10:20:00  abc-webapp-02  0.000000
2017-05-31 10:25:00  abc-webapp-02  0.000000
2017-05-31 10:30:00  abc-webapp-02  0.000000
2017-05-31 10:35:00  abc-webapp-02  0.000000
2017-05-31 10:40:00  abc-webapp-02  0.000000
Sign up to request clarification or add additional context in comments.

1 Comment

Thank you! This works. I was wondering why loc is not working.
2
  • This type of index slicing includes the end points and so what you have will include the entire sample set

    df.loc['2017-05-30':'2017-05-31']
    #df['2017-05-30':'2017-05-31']
    
                                    vm  LoadInt1
    time                                        
    2017-05-31 10:00:00  abc-webapp-02  3.133333
    2017-05-31 10:05:00  abc-webapp-02  0.000000
    2017-05-31 10:10:00  abc-webapp-02  0.000000
    2017-05-31 10:15:00  abc-webapp-02  0.000000
    2017-05-31 10:20:00  abc-webapp-02  0.000000
    2017-05-31 10:25:00  abc-webapp-02  0.000000
    2017-05-31 10:30:00  abc-webapp-02  0.000000
    2017-05-31 10:35:00  abc-webapp-02  0.000000
    2017-05-31 10:40:00  abc-webapp-02  0.000000
    
  • This shows the same thing but actually subsets

    df.loc['2017-05-31 10:10':'2017-05-31 10:35']
    
                                    vm  LoadInt1
    time                                        
    2017-05-31 10:10:00  abc-webapp-02       0.0
    2017-05-31 10:15:00  abc-webapp-02       0.0
    2017-05-31 10:20:00  abc-webapp-02       0.0
    2017-05-31 10:25:00  abc-webapp-02       0.0
    2017-05-31 10:30:00  abc-webapp-02       0.0
    2017-05-31 10:35:00  abc-webapp-02       0.0
    
  • Your import could be made smaller. You don't need the parser

    df = pd.read_csv("my_file.csv", parse_dates=[1], index_col=1)
    

5 Comments

The problem is df.loc['2017-05-30':'2017-05-31'] is not working for me. That is surprising me. It's returning an empty series. That is true when I don't have the parser as well. Any thoughts why?
Are you still using the same import you used. I used the one I mentioned and the index is a DatetimeIndex. Check to see yours is too. type(df.index) If it isn't then that means your conversion to datetime didn't work and. Try mine instead, which I posted in my third bullet. What version of pandas are you using? pd.__version__
The index is : pandas.tseries.index.DatetimeIndex. And the pandas version is 0.18.1. Even after removing 'dateparse' thing, I am getting the same empty result. :-(
@ArnabBiswas take off the loc and just do df['2017-05-30':'2017-05-31']
Noop. It's not working. One catch is in my CSV, the time indices are not equally spaced. For example, for the latest date, every 5 minutes, there is one data point. For one day older data, every half an hour there is a data point. The older is the data, more sparse it is. Could that be a reason?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.