0

I'm trying to read a Excel file (.xlsx) and I'm getting the error "IndexError: list index out of range". My code is simple:

import pandas as pd
pd.read_excel(r'M:\PUBLIC\Felipe Dias\ONS\DIARIO_16-11-2021.xlsx')

The error:


  File "<ipython-input-16-fd0112985376>", line 2, in <module>
    pd.read_excel(r'M:\PUBLIC\Felipe Dias\ONS\DIARIO_16-11-2021.xlsx')

  File "C:\Users\Felipe.dias\Anaconda3\lib\site-packages\pandas\util\_decorators.py", line 311, in wrapper
    return func(*args, **kwargs)

  File "C:\Users\Felipe.dias\Anaconda3\lib\site-packages\pandas\io\excel\_base.py", line 364, in read_excel
    io = ExcelFile(io, storage_options=storage_options, engine=engine)

  File "C:\Users\Felipe.dias\Anaconda3\lib\site-packages\pandas\io\excel\_base.py", line 1233, in __init__
    self._reader = self._engines[engine](self._io, storage_options=storage_options)

  File "C:\Users\Felipe.dias\Anaconda3\lib\site-packages\pandas\io\excel\_openpyxl.py", line 522, in __init__
    super().__init__(filepath_or_buffer, storage_options=storage_options)

  File "C:\Users\Felipe.dias\Anaconda3\lib\site-packages\pandas\io\excel\_base.py", line 420, in __init__
    self.book = self.load_workbook(self.handles.handle)

  File "C:\Users\Felipe.dias\Anaconda3\lib\site-packages\pandas\io\excel\_openpyxl.py", line 533, in load_workbook
    return load_workbook(

  File "C:\Users\Felipe.dias\Anaconda3\lib\site-packages\openpyxl\reader\excel.py", line 317, in load_workbook
    reader.read()

  File "C:\Users\Felipe.dias\Anaconda3\lib\site-packages\openpyxl\reader\excel.py", line 281, in read
    apply_stylesheet(self.archive, self.wb)

  File "C:\Users\Felipe.dias\Anaconda3\lib\site-packages\openpyxl\styles\stylesheet.py", line 198, in apply_stylesheet
    stylesheet = Stylesheet.from_tree(node)

  File "C:\Users\Felipe.dias\Anaconda3\lib\site-packages\openpyxl\styles\stylesheet.py", line 103, in from_tree
    return super(Stylesheet, cls).from_tree(node)

  File "C:\Users\Felipe.dias\Anaconda3\lib\site-packages\openpyxl\descriptors\serialisable.py", line 103, in from_tree
    return cls(**attrib)

  File "C:\Users\Felipe.dias\Anaconda3\lib\site-packages\openpyxl\styles\stylesheet.py", line 94, in __init__
    self.named_styles = self._merge_named_styles()

  File "C:\Users\Felipe.dias\Anaconda3\lib\site-packages\openpyxl\styles\stylesheet.py", line 114, in _merge_named_styles
    self._expand_named_style(style)

  File "C:\Users\Felipe.dias\Anaconda3\lib\site-packages\openpyxl\styles\stylesheet.py", line 124, in _expand_named_style
    xf = self.cellStyleXfs[named_style.xfId]

  File "C:\Users\Felipe.dias\Anaconda3\lib\site-packages\openpyxl\styles\cell_style.py", line 185, in __getitem__
    return self.xf[idx]

IndexError: list index out of range

Maybe the Excel file is not proberly formatted, as stated here and here. But when I manually save the Excel file, the file is a "Excel Workbook (*.xlsx)" not a "Strict XML Open Spreadsheet", like in those other questions:

Print Screen: saving the Excel file manually

I downloaded this file from the web, so maybe the file is broken, but I don't know how to check it.

Thanks for your attention!

Edit 1:

Here is a print screen from the website's HTML

I don't know HTML, but I found strange that the file id is "xls-link" and its href is "./Html/DIARIO_16-11-2021.xlsx".

Like @Wayne said, when he downloaded the file, it came as .xls. After reading this answer, I tried running

pd.read_excel(r'M:\PUBLIC\Felipe Dias\ONS\DIARIO_16-11-2021.xls')

And got the error "[Errno 2] No such file or directory: 'M:\PUBLIC\Felipe Dias\ONS\DIARIO_16-11-2021.xls' "

Then I tried to open the file manually and save it as .xls. After running the code above, it actually worked!

But now my problem is: I will have to manually open and save as .xls all +5000 daily files that I need, which is a tedious quest. Does anyone know how I could do this automatically (without actually open it, because I still can't figure it out)?

10
  • I take it there is a lot of code you are attempting on the one file? Commented Nov 17, 2021 at 20:57
  • @stefan_aus_hannover I don't know what you mean, but I'm doing a web scraping, download a bunch of file and then (trying) to read them one buy one. But I guess it doesn't matter, because even with only these 2 lines of code I get the same error Commented Nov 17, 2021 at 21:01
  • your error shows at least two places that error could have come from. Its generic especially cause its not giving you a line that it happened on. Commented Nov 17, 2021 at 21:03
  • @stefan_aus_hannover genau, it's a super generic error message Commented Nov 17, 2021 at 21:05
  • I opened in Excel and saved it as .xls and then xls = pd.ExcelFile('DIARIO_16-11-2021.xls') worked. I could see the sheet names listed using xls.sheet_names. Based on stackoverflow.com/a/61939376/8508004 and stackoverflow.com/questions/26521266/… Commented Nov 17, 2021 at 21:21

1 Answer 1

0

Cause: The cause of the error in my case is that I saved my excel file as "Strict Open XML Spreadsheet (*.xlsx)" rather than "Excel Workbook (*.xlsx)." Pandas read_excel complains about the former but works smoothly with the latter.

Solution: To fix it, simply open up the original excel and save it as a new file with the correct filename extension "Excel Workbook (*.xlsx)".

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.