Pandas: parsing corrupted .xls file

Question

I'm using pandas to read .xls files and extract tables into df.(I can open it with Excel, but it gives me a pop up: .xls file cannot be accessed. The file may be corrupted, located on a server that is not responding, or read-only exception. ).

In general properties its Microsoft Excel 97-2003 Worksheet (.xls)

Code:

import os, sys
import pandas as pd
from os import walk


file_path = os.path.dirname(os.path.abspath(__file__)) 

excels = [pd.read_excel(name) for name in file_path]  

df = [x.parse(x.sheet_names[0], header=None,index_col=None) for x in excels] #Error

df.to_excel("Final.xls", header=False, index=False)

Error:

pd.ExcelFile(name) :

    raise XLRDError('Unsupported format, or corrupt file: ' + msg)
xlrd.biffh.XLRDError: Unsupported format, or corrupt file: Expected BOF record; found b'\xc1\xc5  \t\xc7\xed\xcf'

or  (with rea_html)
  File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\html.py", line 545, in _parse_tables
    raise ValueError("No tables found")
ValueError: No tables found

However as the error message says, the first 8 bytes of the file are '\xc1\xc5' ... that is definitely not Excel .xls format...

Is it any way to proceed such files?

What do you expect it to do? If YOU can't even figure out what the data is, then you can't tell the computer how to do it. I would point out that you are not limiting your search to .xls files. You're trying to open everything. — Tim Roberts
– Tim Roberts, Commented Mar 18, 2021 at 20:55
@TimRoberts Good point, thanks, I fixed the code to make it more appealing. I have only xls files in folder, so I am reading only them. How can I figure out what the data is, if file opens normally and works in Exel\ Hex editors? — 干猕猴桃
– 干猕猴桃, Commented Mar 18, 2021 at 21:09
Ah, I missed the fact that Excel is able to open it after complaining. The old Office documents all start with hex D0 CF 11 E0. Without having the file, I couldn't guess. — Tim Roberts
– Tim Roberts, Commented Mar 18, 2021 at 21:55

Dharman · Accepted Answer · 2021-04-13 14:36:09Z

0

Although I am new to these Pandas things; the first thing I realize is there is a syntax error down below. It should have been "pd.read_excel".

excels = [pd.read_exel(name) for name in file_path]

The second thing I can say is; corrupted xls files could be read by "pd.read_html()". I hope it helps.

edited Apr 13, 2021 at 14:36

Dharman♦

33.9k27 gold badges106 silver badges157 bronze badges

answered Apr 13, 2021 at 14:30

OYTUN ORAL

92 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Pandas: parsing corrupted .xls file

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related