2

I have a file with some non-ASCII characters.

$ file bi companies.txt
text/plain; charset=utf-8

On my desktop with Python 3.4 I can open this file with no problems:

>>> open('companies.txt').read()
'...'

On a CI system with Python 3.3 I get this:

>>> open('companies.txt').read()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.3/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc4 in position 1223: ordinal not in range(128)

But if I explicitly specify encoding='utf8', it works:

>>> open('companies.txt', encoding='utf8').read()
'...'

On both systems, sys.getdefaultencoding returns 'utf-8'.

Any ideas what is causing the systems to behave differently? Why is the CI system trying to use ascii?

4
  • 1
    Always specify the encoding if you know it, rather than relying on the default encoding, which may change. Commented Jan 14, 2015 at 0:33
  • 1
    That may be good advice, but my question is particular to why the two systems are behaving differently despite the default encoding (as far as I can tell) being the same. Commented Jan 14, 2015 at 0:40
  • @AndrewMagee. What does locale.getpreferredencoding() return on each system? Commented Jan 14, 2015 at 0:42
  • Ah, that looks like it could be it. On the CI system, that returns 'ANSI_X3.4-1968' so that would explain the difference. I wasn't aware of that being a thing. If you write that in an answer I will accept it. Commented Jan 14, 2015 at 0:49

1 Answer 1

2

The encoding for text files is determined by locale.getpreferredencoding, rather than sys.getdefaultencoding.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.