KeyError when reading from Excel data into dataframe

Question

I have an Excel file with two sheets and I am trying to read both of them into a dataframe as in the code below. However, I get the error

KeyError: "['months_to_maturity' 'asset_id' 'orig_iss_dt' 'maturity_dt' 'pay_freq_cd'\n 'coupon' 'closing_price'] not in index"

in the line

return df[['months_to_maturity', 'asset_id', 'orig_iss_dt', 'maturity_dt' , 'pay_freq_cd', 'coupon', 'closing_price']]

in the SecondExcelFileReader() function. However, both sheets have the headers

asset_id    orig_iss_dt maturity_dt  pay_freq_cd    coupon  closing_price   months_to_maturity

I return df as follows as that is the order in which I want the columns.

def ExcelFileReader():
    xls = pd.ExcelFile('D:/USDataRECENTLY.xls')
    df = xls.parse(xls.sheet_names[0])
    return df[['months_to_maturity', 'asset_id', 'orig_iss_dt', 'maturity_dt' , 'pay_freq_cd', 'coupon', 'closing_price']]


def SecondExcelFileReader():
    xls = pd.ExcelFile('D:/USDataRECENTLY.xls')
    df = xls.parse(xls.sheet_names[1])
    return df[['months_to_maturity', 'asset_id', 'orig_iss_dt', 'maturity_dt' , 'pay_freq_cd', 'coupon', 'closing_price']]

def mergingdataframes():
    df1 = ExcelFileReader()
    df2 = SecondExcelFileReader()
    return pd.concat([df1, df2])

Edit: This Excel file was exported from Sybase Oracle SQL Developer and hence the first sheet came already with the titles. I just copied and pasted the second sheet with the same titles. Also, I am only having the issue with the second sheet.

Sheet 1:

Sheet 2:

@AnandSKumar I don't have an issue with the first sheet. I do have to explain that this Excel file was exported from Sybase Oracle SQL Developer and hence the first sheet came already with the titles. I just copied and pasted the 2nd sheet with the titles though. — user131983
– user131983, Commented Jul 23, 2015 at 17:49
Can you show the second sheet and first sheet (maybe a screenshot)? — Anand S Kumar
– Anand S Kumar, Commented Jul 23, 2015 at 17:51
First look at the output df before selecting a subset of the columns. If it looks good, try to select each column individually and see if you get a KeyError. If so, it could be something silly like extra whitespace in one of the column names. — JoeCondron
– JoeCondron, Commented Jul 23, 2015 at 18:00
@user131983 Why are you reading those files in that manner? Why not use pandas.read_excel? This method includes a number of arguments to control the parsing including a sheetname argument. — kennes
– kennes, Commented Jul 23, 2015 at 18:32

Skorpeo · Accepted Answer · 2015-07-25 04:28:07Z

1

def ExcelFileReader():
    xls = pd.ExcelFile('D:/USDataRECENTLY.xls')
    sheet_num = xls.sheet_names.index(xls.sheet_names[0])
    df = pd.read_excel('D:/USDataRECENTLY.xls',sheetname=sheet_num)
    return df[['months_to_maturity', 'asset_id', 'orig_iss_dt', 'maturity_dt' ,'pay_freq_cd', 'coupon', 'closing_price']]

Alternatively in this case instead of sheetname = xls.sheet_names[0] you could use sheetname=0

Looks like your issue is that your second sheetname is "Sheet1" and based on ExcelParser documentation "Sheet1" means the first sheet, but in your case it's the second sheet. http://pandas.pydata.org/pandas-docs/stable/generated/pandas.ExcelFile.parse.html

A better implementation would be:

def mergingdataframes():
    mergedf= pd.concat(pd.read_excel('D:/USDataRECENTLY.xls', sheetname=[0,1]))
    mergedf.index = mergedf.index.droplevel(0)# need this to drop dict keys
    return mergedf

edited Jul 25, 2015 at 4:28

answered Jul 25, 2015 at 3:29

Skorpeo

2,5823 gold badges17 silver badges21 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

KeyError when reading from Excel data into dataframe

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related