Python pandas for reading in file with date

Question

In the dataframe below, the 3rd line is the header and the Y, M and D columns are giving year month and day respectively. However, I am not able to read them in using this code:

df = pandas.read_csv(file_name, skiprows = 2, index_col='datetime', 
                 parse_dates={'datetime': [0,1,2]}, 
                 date_parser=lambda x: pandas.datetime.strptime(x, '%Y %m %d'))

         OTH-000.opc
              XKN1=    0.500000E-01
    Y   M   D     PRCP     VWC1    
 2006   1   1      0.0  0.17608E+00
 2006   1   2      6.0  0.21377E+00
 2006   1   3      0.1  0.22291E+00
 2006   1   4      3.0  0.23460E+00
 2006   1   5      6.7  0.26076E+00

I get a KeyError: List index out of range. Any suggestions?

This sample data works for me. Perhaps there is an issue with the data? Have you tested it to ensure that it is valid? — Alexander
– Alexander, Commented Apr 22, 2015 at 17:16

DSM · Accepted Answer · 2015-04-22 17:34:45Z

1

The default separator in read_csv is a comma. Your file doesn't use commas as separators, so you're only getting one big column:

>>> pd.read_csv(file_name, skiprows = 2)
       Y   M   D     PRCP     VWC1    
0   2006   1   1      0.0  0.17608E+00
1   2006   1   2      6.0  0.21377E+00
2   2006   1   3      0.1  0.22291E+00
3   2006   1   4      3.0  0.23460E+00
4   2006   1   5      6.7  0.26076E+00
>>> pd.read_csv(file_name, skiprows = 2).columns
Index([u'    Y   M   D     PRCP     VWC1    '], dtype='object')

You should be able to use delim_whitespace=True:

>>> df = pd.read_csv(file_name, skiprows = 2, delim_whitespace=True,
                     parse_dates={"datetime": [0,1,2]}, index_col="datetime")
>>> df
            PRCP     VWC1
datetime                 
2006-01-01   0.0  0.17608
2006-01-02   6.0  0.21377
2006-01-03   0.1  0.22291
2006-01-04   3.0  0.23460
2006-01-05   6.7  0.26076
>>> df.index
<class 'pandas.tseries.index.DatetimeIndex'>
[2006-01-01, ..., 2006-01-05]
Length: 5, Freq: None, Timezone: None

(I didn't specify the date_parser, because I'm lazy and this would be read correctly by default, but it's actually not a bad habit to be explicit.)

answered Apr 22, 2015 at 17:34

DSM

355k67 gold badges606 silver badges504 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

user308827 Over a year ago

thanks! delim_whitespace helps, however, when I do df.index, I do not get a time series. any idea why?

user308827 Over a year ago

follow up question for the query above is here: stackoverflow.com/questions/29805372/…

Collectives™ on Stack Overflow

Python pandas for reading in file with date

1 Answer 1

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related