0

I am reading from xml files into Python with the code:

import xml.etree.ElementTree as ET
tree = ET.parse(file_name)

For some reason the source i am reading from appears to have the incorrect encoding specified in the file (it is correct for 10 years of the data that I am reading from, and then suddenly i get problems for subsequent files).

Specifically i get the following error raised:

xml.etree.ElementTree.ParseError: encoding specified in XML declaration is incorrect: line 1, column 30

I think the data is encoding in UTF-8, however the encoding specified in the file is UTF-16 [the first line of the file is <?xml version='1.0' encoding='UTF-16'?>] - when i manually change the file text to say UTF-8 i do not get an error raised, and as far as i can tell, it appears to be reading everything correctly.

How do you override the xml reader so that it treats the encoding as UTF-8, and ignores what is specified within the file?

2

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.