0

The title of this question is the same as this one, but my problem is different.

I have data read from a tar.gz archive using tarfile.TarFile.extractfile() and stored in an xml.etree.ElementTree.Element. The data is utf-8 encoded.

When I attempt to use xml.etree.ElementTree.ElementTree.write() to put it in a file, I get this:

Traceback (most recent call last):
  File "../python/GetDoc.py", line 150, in <module>
    ET.ElementTree(elements).write(args.outfile, encoding="us-ascii", method="text")
  File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 815, in write
    _serialize_text(write, self._root, encoding)
  File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1007, in _serialize_text
    write(part.encode(encoding))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 10: ordinal not in range(128)
2
  • 1
    How are you reading the data from the file? Are you properly decoding it from UTF-8 to unicode objects? Commented Jul 18, 2012 at 2:26
  • That was quick! Indeed that seems to have been the problem. I had thought naively that ElementTree would be able to deal with UTF-8 strings. Go ahead and write an answer and I'll accept it (: Commented Jul 18, 2012 at 2:29

2 Answers 2

1

How are you reading the data from the file? Are you properly decoding it from UTF-8 to unicode objects? ElementTree will want unicode objects, not bytestrings encoded in UTF-8.

Sign up to request clarification or add additional context in comments.

Comments

0

I was able to solve this by converting my strings to unicode using myString.decode('utf-8') before adding them to my ElementTree.Element object. It seems ET.ElementTree.write() is not happy with other string encodings.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.