0

In the dataframe below, the 3rd line is the header and the Y, M and D columns are giving year month and day respectively. However, I am not able to read them in using this code:

df = pandas.read_csv(file_name, skiprows = 2, index_col='datetime', 
                 parse_dates={'datetime': [0,1,2]}, 
                 date_parser=lambda x: pandas.datetime.strptime(x, '%Y %m %d'))

         OTH-000.opc
              XKN1=    0.500000E-01
    Y   M   D     PRCP     VWC1    
 2006   1   1      0.0  0.17608E+00
 2006   1   2      6.0  0.21377E+00
 2006   1   3      0.1  0.22291E+00
 2006   1   4      3.0  0.23460E+00
 2006   1   5      6.7  0.26076E+00

I get a KeyError: List index out of range. Any suggestions?

4
  • This sample data works for me. Perhaps there is an issue with the data? Have you tested it to ensure that it is valid? Commented Apr 22, 2015 at 17:16
  • yes, i am using this exact same data Commented Apr 22, 2015 at 17:17
  • Which version of python and Pandas? Commented Apr 22, 2015 at 17:32
  • python 2.7, pandas: 0.16 Commented Apr 22, 2015 at 17:34

1 Answer 1

1

The default separator in read_csv is a comma. Your file doesn't use commas as separators, so you're only getting one big column:

>>> pd.read_csv(file_name, skiprows = 2)
       Y   M   D     PRCP     VWC1    
0   2006   1   1      0.0  0.17608E+00
1   2006   1   2      6.0  0.21377E+00
2   2006   1   3      0.1  0.22291E+00
3   2006   1   4      3.0  0.23460E+00
4   2006   1   5      6.7  0.26076E+00
>>> pd.read_csv(file_name, skiprows = 2).columns
Index([u'    Y   M   D     PRCP     VWC1    '], dtype='object')

You should be able to use delim_whitespace=True:

>>> df = pd.read_csv(file_name, skiprows = 2, delim_whitespace=True,
                     parse_dates={"datetime": [0,1,2]}, index_col="datetime")
>>> df
            PRCP     VWC1
datetime                 
2006-01-01   0.0  0.17608
2006-01-02   6.0  0.21377
2006-01-03   0.1  0.22291
2006-01-04   3.0  0.23460
2006-01-05   6.7  0.26076
>>> df.index
<class 'pandas.tseries.index.DatetimeIndex'>
[2006-01-01, ..., 2006-01-05]
Length: 5, Freq: None, Timezone: None

(I didn't specify the date_parser, because I'm lazy and this would be read correctly by default, but it's actually not a bad habit to be explicit.)

Sign up to request clarification or add additional context in comments.

2 Comments

thanks! delim_whitespace helps, however, when I do df.index, I do not get a time series. any idea why?
follow up question for the query above is here: stackoverflow.com/questions/29805372/…

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.