Suppose I have strings with lots of stuff like
“words words words
Is there a way to convert these through python directly into the characters they represent?
I tried
h = HTMLParser.HTMLParser()
print h.unescape(x)
but got this error:
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-2: ordinal not in range(128)
I also tried
print h.unescape(x).encode(utf-8)
but it encodes
“ as â
when it should be a quote
“should be a comma? what webpage is this coming from? to convert them to the characters they representh.unescape(x)does that ... but when you try and print it there are problems ... try looking at its repr