Pandas read_excel Error: list index out of range

Question

I'm trying to read a Excel file (.xlsx) and I'm getting the error "IndexError: list index out of range". My code is simple:

import pandas as pd
pd.read_excel(r'M:\PUBLIC\Felipe Dias\ONS\DIARIO_16-11-2021.xlsx')

The error:


  File "<ipython-input-16-fd0112985376>", line 2, in <module>
    pd.read_excel(r'M:\PUBLIC\Felipe Dias\ONS\DIARIO_16-11-2021.xlsx')

  File "C:\Users\Felipe.dias\Anaconda3\lib\site-packages\pandas\util\_decorators.py", line 311, in wrapper
    return func(*args, **kwargs)

  File "C:\Users\Felipe.dias\Anaconda3\lib\site-packages\pandas\io\excel\_base.py", line 364, in read_excel
    io = ExcelFile(io, storage_options=storage_options, engine=engine)

  File "C:\Users\Felipe.dias\Anaconda3\lib\site-packages\pandas\io\excel\_base.py", line 1233, in __init__
    self._reader = self._engines[engine](self._io, storage_options=storage_options)

  File "C:\Users\Felipe.dias\Anaconda3\lib\site-packages\pandas\io\excel\_openpyxl.py", line 522, in __init__
    super().__init__(filepath_or_buffer, storage_options=storage_options)

  File "C:\Users\Felipe.dias\Anaconda3\lib\site-packages\pandas\io\excel\_base.py", line 420, in __init__
    self.book = self.load_workbook(self.handles.handle)

  File "C:\Users\Felipe.dias\Anaconda3\lib\site-packages\pandas\io\excel\_openpyxl.py", line 533, in load_workbook
    return load_workbook(

  File "C:\Users\Felipe.dias\Anaconda3\lib\site-packages\openpyxl\reader\excel.py", line 317, in load_workbook
    reader.read()

  File "C:\Users\Felipe.dias\Anaconda3\lib\site-packages\openpyxl\reader\excel.py", line 281, in read
    apply_stylesheet(self.archive, self.wb)

  File "C:\Users\Felipe.dias\Anaconda3\lib\site-packages\openpyxl\styles\stylesheet.py", line 198, in apply_stylesheet
    stylesheet = Stylesheet.from_tree(node)

  File "C:\Users\Felipe.dias\Anaconda3\lib\site-packages\openpyxl\styles\stylesheet.py", line 103, in from_tree
    return super(Stylesheet, cls).from_tree(node)

  File "C:\Users\Felipe.dias\Anaconda3\lib\site-packages\openpyxl\descriptors\serialisable.py", line 103, in from_tree
    return cls(**attrib)

  File "C:\Users\Felipe.dias\Anaconda3\lib\site-packages\openpyxl\styles\stylesheet.py", line 94, in __init__
    self.named_styles = self._merge_named_styles()

  File "C:\Users\Felipe.dias\Anaconda3\lib\site-packages\openpyxl\styles\stylesheet.py", line 114, in _merge_named_styles
    self._expand_named_style(style)

  File "C:\Users\Felipe.dias\Anaconda3\lib\site-packages\openpyxl\styles\stylesheet.py", line 124, in _expand_named_style
    xf = self.cellStyleXfs[named_style.xfId]

  File "C:\Users\Felipe.dias\Anaconda3\lib\site-packages\openpyxl\styles\cell_style.py", line 185, in __getitem__
    return self.xf[idx]

IndexError: list index out of range

Maybe the Excel file is not proberly formatted, as stated here and here. But when I manually save the Excel file, the file is a "Excel Workbook (*.xlsx)" not a "Strict XML Open Spreadsheet", like in those other questions:

Print Screen: saving the Excel file manually

I downloaded this file from the web, so maybe the file is broken, but I don't know how to check it.

Thanks for your attention!

Edit 1:

Here is a print screen from the website's HTML

I don't know HTML, but I found strange that the file id is "xls-link" and its href is "./Html/DIARIO_16-11-2021.xlsx".

Like @Wayne said, when he downloaded the file, it came as .xls. After reading this answer, I tried running

pd.read_excel(r'M:\PUBLIC\Felipe Dias\ONS\DIARIO_16-11-2021.xls')

And got the error "[Errno 2] No such file or directory: 'M:\PUBLIC\Felipe Dias\ONS\DIARIO_16-11-2021.xls' "

Then I tried to open the file manually and save it as .xls. After running the code above, it actually worked!

But now my problem is: I will have to manually open and save as .xls all +5000 daily files that I need, which is a tedious quest. Does anyone know how I could do this automatically (without actually open it, because I still can't figure it out)?

I take it there is a lot of code you are attempting on the one file? — OldManSeph
– OldManSeph, Commented Nov 17, 2021 at 20:57
@stefan_aus_hannover I don't know what you mean, but I'm doing a web scraping, download a bunch of file and then (trying) to read them one buy one. But I guess it doesn't matter, because even with only these 2 lines of code I get the same error — femdias
– femdias, Commented Nov 17, 2021 at 21:01
your error shows at least two places that error could have come from. Its generic especially cause its not giving you a line that it happened on. — OldManSeph
– OldManSeph, Commented Nov 17, 2021 at 21:03
@stefan_aus_hannover genau, it's a super generic error message — femdias
– femdias, Commented Nov 17, 2021 at 21:05
I opened in Excel and saved it as .xls and then xls = pd.ExcelFile('DIARIO_16-11-2021.xls') worked. I could see the sheet names listed using xls.sheet_names. Based on stackoverflow.com/a/61939376/8508004 and stackoverflow.com/questions/26521266/… — Wayne
– Wayne, Commented Nov 17, 2021 at 21:21

Li-Pin Juan · Accepted Answer · 2023-10-22 14:19:36Z

0

Cause: The cause of the error in my case is that I saved my excel file as "Strict Open XML Spreadsheet (*.xlsx)" rather than "Excel Workbook (*.xlsx)." Pandas read_excel complains about the former but works smoothly with the latter.

Solution: To fix it, simply open up the original excel and save it as a new file with the correct filename extension "Excel Workbook (*.xlsx)".

answered Oct 22, 2023 at 14:19

Li-Pin Juan

1,2551 gold badge16 silver badges24 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Pandas read_excel Error: list index out of range

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related