0

An xls file has xml data inside it. Top portion of the file is shown below:

<?xml version="1.0" encoding="UTF-8"?>
<Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet" xmlns:html="http://www.w3.org/TR/REC-html40">
<Styles>
<Style ss:ID="sDT"><NumberFormat ss:Format="Short Date"/></Style>
</Styles>
<Worksheet ss:Name="XXX">
    <Table>
        <Row>
            <Cell><Data ss:Type="String">Request ID</Data></Cell>
            <Cell><Data ss:Type="String">Date</Data></Cell>
            <Cell><Data ss:Type="String">XXX ID</Data></Cell>
            <Cell><Data ss:Type="String">Customer Name</Data></Cell>
            <Cell><Data ss:Type="String">Amount</Data></Cell>
            <Cell><Data ss:Type="String">Requested Action</Data></Cell>
            <Cell><Data ss:Type="String">Status</Data></Cell>
            <Cell><Data ss:Type="String">Transaction ID</Data></Cell>
            <Cell><Data ss:Type="String">Merchant UTR</Data></Cell>
        </Row>

How can I read it into a Pandas DataFrame using pandas.read_xml. (Any other way of reading it into a DataFrame will also do.)

Note: Have already tried various solutions using read_excel with and without engine ="openpyxl". Different errors are displayed. (See comments below. The comments also contain a link to the same problem faced by others earlier.)

13
  • 1
    Why are you trying to read it as xml? Commented Jul 8, 2022 at 15:38
  • @DeepSpace As I unable to read the file using pandas.read_excel or read_csv Commented Jul 8, 2022 at 15:43
  • 1
    So ask about that (and provide the exact error that you get) instead of asking an xy problem Commented Jul 8, 2022 at 15:45
  • Have made many attempts to read the file using read_excel. The file does not readily open in Excel. It opens only after ignoring a message about its format not matching the extension etc. stackoverflow.com/questions/33470130/…. Have tried solutions like this and others. Hence, out of desperation looking for a read_xml solution.. Commented Jul 8, 2022 at 16:09
  • read_excel(filename) gives this error: ValueError: Excel file format cannot be determined, you must specify an engine manually If I add "engine = "openpyxl", gives this error: BadZipFile: File is not a zip file Commented Jul 8, 2022 at 16:15

2 Answers 2

2

Your file is a valid xml file. I know no automatic converter for that but Excel, but it can easily be parsed as a mere xml file, for example with BeautifulSoul.

If the internal format is simple enough, you could just process the Worksheet, row and cell tags to convert it to a csv file:

from bs4 import BeautifulSoup
import csv
import io

soup = BeautifulSoup(open('file.xxx'))
    
with open('file.csv', newline='') as fdout:
    wr = csv.writer(fdout)
    sheet = soup.find('worksheet')
    for row in sheet.findAll('row'):
        wr.writerow(cell.text for cell in row.findAll('cell'))

Using your sample data, it gives as expected:

Request ID,Date,XXX ID,Customer Name,Amount,Requested Action,Status,Transaction ID,Merchant UTR
Sign up to request clarification or add additional context in comments.

1 Comment

Sorry for the delay in responding. Your code did exactly what I wanted. I then read the csv file into a DataFrame. Thanks. Any inputs on what the code would be like if the Python xml module was used.
0

Try to define another engine:

df = pd.read_excel('test.xls', engine='xlrd')

Note that you need to install xlrd library, e.g.:

pip install xlrd

2 Comments

WIth engine = 'xlrd', the following error message appeared XLRDError: Unsupported format, or corrupt file: Expected BOF record; found b'<?xml ve'
Then, probably, your file is corrupted or stored in an incorrect format. I've tried with an xls file locally and this approach worked. I think that your xls is generated by some software and the format is deliberately made to be not compatible with software to read the data.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.