0

I have a folder with many xlsx files that I'd like to convert to csv files.

During my research, if found several threads about this topic, such as this or that one. Based on this, I formulated the following code using glob and pandas:

import glob
import pandas as pd

path = r'/Users/.../xlsx files'
excel_files = glob.glob(path + '/*.xlsx')

for excel in excel_files:
    out = excel.split('.')[0]+'.csv'
    df = pd.read_excel(excel)         # error occurs here 
    df.to_csv(out)

But unfortunately, I got the following error message that I could not interpret in this context and I could not figure out how to solve this problem:

Traceback (most recent call last):
  File "<input>", line 11, in <module>
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pandas/util/_decorators.py", line 299, in wrapper
    return func(*args, **kwargs)
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pandas/io/excel/_base.py", line 336, in read_excel
    io = ExcelFile(io, storage_options=storage_options, engine=engine)
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pandas/io/excel/_base.py", line 1131, in __init__
    self._reader = self._engines[engine](self._io, storage_options=storage_options)
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pandas/io/excel/_openpyxl.py", line 475, in __init__
    super().__init__(filepath_or_buffer, storage_options=storage_options)
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pandas/io/excel/_base.py", line 391, in __init__
    self.book = self.load_workbook(self.handles.handle)
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pandas/io/excel/_openpyxl.py", line 486, in load_workbook
    return load_workbook(
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/openpyxl/reader/excel.py", line 317, in load_workbook
    reader.read()
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/openpyxl/reader/excel.py", line 281, in read
    apply_stylesheet(self.archive, self.wb)
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/openpyxl/styles/stylesheet.py", line 198, in apply_stylesheet
    stylesheet = Stylesheet.from_tree(node)
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/openpyxl/styles/stylesheet.py", line 103, in from_tree
    return super(Stylesheet, cls).from_tree(node)
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/openpyxl/descriptors/serialisable.py", line 87, in from_tree
    obj = desc.expected_type.from_tree(el)
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/openpyxl/descriptors/serialisable.py", line 87, in from_tree
    obj = desc.expected_type.from_tree(el)
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/openpyxl/descriptors/serialisable.py", line 103, in from_tree
    return cls(**attrib)
TypeError: __init__() got an unexpected keyword argument 'xfid'

Does anyone know how to fix this? Thanks a lot for your help!

4
  • Does this fail on all of your Excel files? Trying adding print(excel) before doing the read call to see which file it is failing on. You could also add exception handling to skip over files that fail. Commented Jul 19, 2021 at 11:10
  • Are you running on Windows with Excel installed? If so there could be an alternative approach Commented Jul 19, 2021 at 11:11
  • It seems that it fails on all excel files: The files are numbered consecutively. I added print(excel) as you suggested, and it randomly gave me one of those numbers. After moving this file to another folder, it failed on another random file. I tried this on >25 files. No, I am running on MacOS with Excel installed. Do you have any advice on how to deal with this? Commented Jul 19, 2021 at 12:32
  • You could try using the openpyxl.load_workbook() function directly (without Pandas) and pass data_only=True. Probably wont help but worth a try. If you had been on Windows you can automate Excel directly to load and save to CSV format, it might be possible on Mac but I've not tried. Commented Jul 19, 2021 at 14:00

1 Answer 1

0

I had the same problem here. After some hours thinking and searching I realized the problem is, actually, the file. I opened it using MS Excel, and save. Alakazan, problem solved.

The file was downloaded, so i think it's a "security" error or just an error from how the file was created. xD

EDIT: It's not a security problem, but actually an error from the generation of file. The correct has the double of kb the wrong file. An solution is: if using xlrd==1.2.0 the file can be opened, you can, after doing this, call read_excel to the Book(file opened by xlrd).

import xlrd

# df = pd.read_excel('TabelaPrecos.xlsx')
# The line above is the same result

a = xlrd.open_workbook('TabelaPrecos.xlsx')
b = pd.read_excel(a)
Sign up to request clarification or add additional context in comments.

1 Comment

Author's file was Kendo UI. If you are having same bug,

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.