I have a problem with reading a web page that didn't specified charset.It contains some non-ascii characters such as euro currency, and my browser is able to read it fine.In firefox, on page info I can see that Encoding used is 'ISO-8859-1' and render mode 'Quirks mode'. However, python-requests can't really decode those non-ascii characters, and I get myself an error when trying to write for example that string to a text file.Example:
result = requests.get(url)
result.encoding = 'ISO-8859-1'
html = result.text
open('textfile.txt', 'w').write(html)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\x80'
If u'\x80' should represent euro currency in 'ISO-8859-1' encoding, this should work
print '\x80'.decode('ISO-8859-1')
but I get a non-printable character, not euro.
So, how that web page works in a browser, but requests(urllib/2 too) can't handle that encoding? I tried also with 'utf-8' but same thing. Any suggestions?