0

I'm using pandas to read .xls files and extract tables into df.(I can open it with Excel, but it gives me a pop up: .xls file cannot be accessed. The file may be corrupted, located on a server that is not responding, or read-only exception. ).

In general properties its Microsoft Excel 97-2003 Worksheet (.xls)

Code:

import os, sys
import pandas as pd
from os import walk


file_path = os.path.dirname(os.path.abspath(__file__)) 

excels = [pd.read_excel(name) for name in file_path]  

df = [x.parse(x.sheet_names[0], header=None,index_col=None) for x in excels] #Error

df.to_excel("Final.xls", header=False, index=False)

Error:

pd.ExcelFile(name) :

    raise XLRDError('Unsupported format, or corrupt file: ' + msg)
xlrd.biffh.XLRDError: Unsupported format, or corrupt file: Expected BOF record; found b'\xc1\xc5  \t\xc7\xed\xcf'

or  (with rea_html)
  File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\html.py", line 545, in _parse_tables
    raise ValueError("No tables found")
ValueError: No tables found

However as the error message says, the first 8 bytes of the file are '\xc1\xc5' ... that is definitely not Excel .xls format...

Is it any way to proceed such files?

3
  • 1
    What do you expect it to do? If YOU can't even figure out what the data is, then you can't tell the computer how to do it. I would point out that you are not limiting your search to .xls files. You're trying to open everything. Commented Mar 18, 2021 at 20:55
  • @TimRoberts Good point, thanks, I fixed the code to make it more appealing. I have only xls files in folder, so I am reading only them. How can I figure out what the data is, if file opens normally and works in Exel\ Hex editors? Commented Mar 18, 2021 at 21:09
  • Ah, I missed the fact that Excel is able to open it after complaining. The old Office documents all start with hex D0 CF 11 E0. Without having the file, I couldn't guess. Commented Mar 18, 2021 at 21:55

1 Answer 1

0

Although I am new to these Pandas things; the first thing I realize is there is a syntax error down below. It should have been "pd.read_excel".

excels = [pd.read_exel(name) for name in file_path]

The second thing I can say is; corrupted xls files could be read by "pd.read_html()". I hope it helps.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.