0

I am trying to convert all xlsx files to csv files in a folder. It worked well in the past, but I am getting an error this time that leaves me no clue.

Here's my code:

excel_files = glob.glob('/*xlsx*')

for excel_file in excel_files:
    df = pd.read_excel(excel_file)
    output = excel_file.split('.')[0]+'.csv'
    df.to_csv(output)

I have also tried the following line to make sure it's not the encoding issue:

df.to_csv(output, encoding='utf-8', index=False)

It converted around 1000 files, but the rest of the 7000 files kept getting the error:

KeyError: 'rId6'

How would you solve it? Thank you.

10
  • 1
    your data is at fault, not the massive converting. Isolate one file that doesn't work and check it. Commented Mar 18, 2018 at 20:04
  • 1
    You will get more and better answers if you create a Minimal, Complete, and Verifiable example. Especially make sure that the input and expected test data are complete (not pseudo-data), and can be easily cut and and paste into an editor to allow testing proposed solutions. Commented Mar 18, 2018 at 20:04
  • I would suggest comparing a working file against the files that did not work. Commented Mar 18, 2018 at 20:14
  • 2
    catch the exception, and print the name of the currently processed file when there's an exception. Commented Mar 18, 2018 at 20:35
  • 1
    But you're not getting an answer to this question, because nobody can answer it unless they can somehow guess what's in your data files. So, assuming you need to solve this, don't just keep that advice in mind for future questions, apply it to this question, gather the information, and edit it in, so someone can help you. Commented Mar 18, 2018 at 20:40

1 Answer 1

2

Some of your files are badly formatted in some way. You should add exception handling to your loop, this would allow the conversions to continue and would indicate which of your files are causing the problem:

excel_files = glob.glob('/*xlsx*')

for excel_file in excel_files:
    print("Converting '{}'".format(excel_file))
    try:
        df = pd.read_excel(excel_file)
        output = excel_file.split('.')[0]+'.csv'
        df.to_csv(output)    
    except KeyError:
        print("  Failed to convert")

You could then try opening the failing files inside Excel to see if they load ok. If they do load, you could upload an example of a failing Excel file to something like pastebin and add a comment here with the link to it so the problem can be recreated.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.