1

I have an Excel file with two sheets and I am trying to read both of them into a dataframe as in the code below. However, I get the error

KeyError: "['months_to_maturity' 'asset_id' 'orig_iss_dt' 'maturity_dt' 'pay_freq_cd'\n 'coupon' 'closing_price'] not in index"

in the line

return df[['months_to_maturity', 'asset_id', 'orig_iss_dt', 'maturity_dt' , 'pay_freq_cd', 'coupon', 'closing_price']]

in the SecondExcelFileReader() function. However, both sheets have the headers

asset_id    orig_iss_dt maturity_dt  pay_freq_cd    coupon  closing_price   months_to_maturity

I return df as follows as that is the order in which I want the columns.

def ExcelFileReader():
    xls = pd.ExcelFile('D:/USDataRECENTLY.xls')
    df = xls.parse(xls.sheet_names[0])
    return df[['months_to_maturity', 'asset_id', 'orig_iss_dt', 'maturity_dt' , 'pay_freq_cd', 'coupon', 'closing_price']]


def SecondExcelFileReader():
    xls = pd.ExcelFile('D:/USDataRECENTLY.xls')
    df = xls.parse(xls.sheet_names[1])
    return df[['months_to_maturity', 'asset_id', 'orig_iss_dt', 'maturity_dt' , 'pay_freq_cd', 'coupon', 'closing_price']]

def mergingdataframes():
    df1 = ExcelFileReader()
    df2 = SecondExcelFileReader()
    return pd.concat([df1, df2])

Edit: This Excel file was exported from Sybase Oracle SQL Developer and hence the first sheet came already with the titles. I just copied and pasted the second sheet with the same titles. Also, I am only having the issue with the second sheet.

Sheet 1: Sheet 1

Sheet 2: Sheet 2

7
  • You do not get the issue for first sheet? Commented Jul 23, 2015 at 17:40
  • @AnandSKumar I don't have an issue with the first sheet. I do have to explain that this Excel file was exported from Sybase Oracle SQL Developer and hence the first sheet came already with the titles. I just copied and pasted the 2nd sheet with the titles though. Commented Jul 23, 2015 at 17:49
  • Can you show the second sheet and first sheet (maybe a screenshot)? Commented Jul 23, 2015 at 17:51
  • 1
    First look at the output df before selecting a subset of the columns. If it looks good, try to select each column individually and see if you get a KeyError. If so, it could be something silly like extra whitespace in one of the column names. Commented Jul 23, 2015 at 18:00
  • 1
    @user131983 Why are you reading those files in that manner? Why not use pandas.read_excel? This method includes a number of arguments to control the parsing including a sheetname argument. Commented Jul 23, 2015 at 18:32

1 Answer 1

1
def ExcelFileReader():
    xls = pd.ExcelFile('D:/USDataRECENTLY.xls')
    sheet_num = xls.sheet_names.index(xls.sheet_names[0])
    df = pd.read_excel('D:/USDataRECENTLY.xls',sheetname=sheet_num)
    return df[['months_to_maturity', 'asset_id', 'orig_iss_dt', 'maturity_dt' ,'pay_freq_cd', 'coupon', 'closing_price']]

Alternatively in this case instead of sheetname = xls.sheet_names[0] you could use sheetname=0

Looks like your issue is that your second sheetname is "Sheet1" and based on ExcelParser documentation "Sheet1" means the first sheet, but in your case it's the second sheet. http://pandas.pydata.org/pandas-docs/stable/generated/pandas.ExcelFile.parse.html

A better implementation would be:

def mergingdataframes():
    mergedf= pd.concat(pd.read_excel('D:/USDataRECENTLY.xls', sheetname=[0,1]))
    mergedf.index = mergedf.index.droplevel(0)# need this to drop dict keys
    return mergedf
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.