4

I think I am following the right approach but I am still getting an encoding error:

from xml.dom.minidom import Document
import codecs

doc = Document()
wml = doc.createElement("wml")
doc.appendChild(wml)

property = doc.createElement("property")
wml.appendChild(property)

descriptionNode = doc.createElement("description")
property.appendChild(descriptionNode)
descriptionText = doc.createTextNode(description.decode('ISO-8859-1'))
descriptionNode.appendChild(descriptionText)

file = codecs.open('contentFinal.xml', 'w', encoding='ISO-8859-1')
file.write(doc.toprettyxml())
file.close()

The description node contains some characters in ISO-8859-1 encoding, this is encoding specified by the site it self in meta tag. But when doc.toprettyxml() starts writing in file I got following error:

Traceback (most recent call last):
File "main.py", line 467, in <module>
    file.write(doc.toprettyxml())
File "C:\Python27\lib\xml\dom\minidom.py", line 60, in toprettyxml
    return writer.getvalue()
File "C:\Python27\lib\StringIO.py", line 271, in getvalue
    self.buf += ''.join(self.buflist)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe1 in position 10: ordinal not in range(128)

Why am I getting this error as I am decoding and encoding with same standard?

Edited

I have following deceleration in my script file:

#!/usr/bin/python
# -*- coding: utf-8 -*-

may be this is conflicting?

3
  • I don't think the Python declaration is the problem. Can you post the content of description, maybe? Commented Sep 28, 2011 at 21:33
  • Lets say if we have this word Orientación in content we got error. The site is in espanol language. Don't know about content as it fails before writing it in to file. If i do not use encoding at all then it writes in to file but that is not accepted by browser and they keep saying error on line blah on column blah Commented Sep 28, 2011 at 21:39
  • Sounds like your description object is a byte-string not a character string. Commented Sep 29, 2011 at 19:51

1 Answer 1

1

Ok i have found a solution. When ever data is in other foriegn language you just need to defined the proper encoding in xml header. You do not need to describe encoding in file.write(doc.toprettyxml(encoding='ISO-8859-1')) not even when you are opening a file for writing file = codecs.open('contentFinal.xml', 'w', encoding='ISO-8859-1'). Below is the technique which i used. May be This is not a professional method but that works for me.

file = codecs.open('abc.xml', 'w')
xm = doc.toprettyxml()
xm = xm.replace('<?xml version="1.0" ?>', '<?xml version="1.0" encoding="ISO-8859-1"?>')
file.write(xm)
file.close()

May be there is a method to set default encoding in header but i could not find it. Above method does not bring any error on browser and all data display perfectly.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.