Python converting xlsx to csv

Question

I am trying to convert all xlsx files to csv files in a folder. It worked well in the past, but I am getting an error this time that leaves me no clue.

Here's my code:

excel_files = glob.glob('/*xlsx*')

for excel_file in excel_files:
    df = pd.read_excel(excel_file)
    output = excel_file.split('.')[0]+'.csv'
    df.to_csv(output)

I have also tried the following line to make sure it's not the encoding issue:

df.to_csv(output, encoding='utf-8', index=False)

It converted around 1000 files, but the rest of the 7000 files kept getting the error:

KeyError: 'rId6'

How would you solve it? Thank you.

your data is at fault, not the massive converting. Isolate one file that doesn't work and check it. — Jean-François Fabre
– Jean-François Fabre ♦, Commented Mar 18, 2018 at 20:04
You will get more and better answers if you create a Minimal, Complete, and Verifiable example. Especially make sure that the input and expected test data are complete (not pseudo-data), and can be easily cut and and paste into an editor to allow testing proposed solutions. — Stephen Rauch
– Stephen Rauch ♦, Commented Mar 18, 2018 at 20:04
I would suggest comparing a working file against the files that did not work. — Niels
– Niels, Commented Mar 18, 2018 at 20:14
catch the exception, and print the name of the currently processed file when there's an exception. — Jean-François Fabre
– Jean-François Fabre ♦, Commented Mar 18, 2018 at 20:35
But you're not getting an answer to this question, because nobody can answer it unless they can somehow guess what's in your data files. So, assuming you need to solve this, don't just keep that advice in mind for future questions, apply it to this question, gather the information, and edit it in, so someone can help you. — abarnert
– abarnert, Commented Mar 18, 2018 at 20:40

Martin Evans · Accepted Answer · 2018-03-20 14:33:48Z

2

Some of your files are badly formatted in some way. You should add exception handling to your loop, this would allow the conversions to continue and would indicate which of your files are causing the problem:

excel_files = glob.glob('/*xlsx*')

for excel_file in excel_files:
    print("Converting '{}'".format(excel_file))
    try:
        df = pd.read_excel(excel_file)
        output = excel_file.split('.')[0]+'.csv'
        df.to_csv(output)    
    except KeyError:
        print("  Failed to convert")

You could then try opening the failing files inside Excel to see if they load ok. If they do load, you could upload an example of a failing Excel file to something like pastebin and add a comment here with the link to it so the problem can be recreated.

answered Mar 20, 2018 at 14:33

Martin Evans

46.9k17 gold badges88 silver badges104 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Python converting xlsx to csv

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related