I have a file with some non-ASCII characters.
$ file bi companies.txt
text/plain; charset=utf-8
On my desktop with Python 3.4 I can open this file with no problems:
>>> open('companies.txt').read()
'...'
On a CI system with Python 3.3 I get this:
>>> open('companies.txt').read()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.3/encodings/ascii.py", line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc4 in position 1223: ordinal not in range(128)
But if I explicitly specify encoding='utf8', it works:
>>> open('companies.txt', encoding='utf8').read()
'...'
On both systems, sys.getdefaultencoding returns 'utf-8'.
Any ideas what is causing the systems to behave differently? Why is the CI system trying to use ascii?
locale.getpreferredencoding()return on each system?'ANSI_X3.4-1968'so that would explain the difference. I wasn't aware of that being a thing. If you write that in an answer I will accept it.