4

I have a program that, when run in Python 2.7, produces proper Unicode output to the standard output. When run in Python 2.4, I get UnicodeEncodeError: 'ascii' codec can't encode characters in position 1-4: ordinal not in range(128). What changed between version 2.4 and 2.7 that this works now?

5
  • 3
    We're not psychic. Show the code. Commented Aug 24, 2011 at 21:24
  • @Karl Knechtel: It just comes from a statement like: sys.stdout.write(unicode(data)) or sys.stdout.write(data). The problem is, this means that the problem is originating from somewhere else...and I have no idea where (the application is relatively large). Commented Aug 24, 2011 at 21:37
  • Try import sys; print sys.getdefaultencoding() to see if the default unicode-to-string encoding is different between the two. Commented Aug 24, 2011 at 22:45
  • @Russell Borogove: Okay, interesting, let me see...both return "asciii"! How puzzling! Commented Aug 24, 2011 at 22:47
  • Is there anything more you can tell us about the data (its source, its value) at the point of failure? Commented Aug 24, 2011 at 22:51

1 Answer 1

7

Although I could not find any mention of it elswhere, it appears that Python 2.7 is automatically converting text to the terminal encoding, instead of throwing an error as expected.

Python 2.7:

> echo $LANG
en_US.UTF-8
> python -c 'import sys; print sys.getdefaultencoding()'
ascii

> python -c 'import sys; sys.stdout.write(u"\u03A3")'
Σ
> python -c 'import sys; sys.stdout.write(u"\u03A3".encode("utf8"))'
Σ

Python 2.6 (on another box)

> echo $LANG
en_US.UTF-8
> python -c 'import sys; print sys.getdefaultencoding()'
ascii

> python -c 'import sys;  sys.stdout.write(u"\u03A3")'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec cant encode character u'\u03a3' in position 0: ordinal not in range(128)
> python -c 'import sys;  sys.stdout.write(u"\u03A3".encode("utf8"))'
Σ

In any case, an .encode("utf8") on the data before output should avoid the issue.

Sign up to request clarification or add additional context in comments.

3 Comments

Why.encode("utf-8") don’t.encode("utf-8") you.encode("utf-8") just.encode("utf-8") set.encode("utf-8") the.encode("utf-8") stream.encode("utf-8") encoding.encode("utf-8") for.encode("utf-8") stdout.encode("utf-8") to.encode("utf-8") be.encode("utf-8") UTF-8.encode("utf-8") all.encode("utf-8") the.encode("utf-8") time?.encode("utf-8") That.encode("utf-8") saves.encode("utf-8") the.encode("utf-8") ridiculous.encode("utf-8") and.encode("utf-8") massive.encode("utf-8") annoyance.encode("utf-8") of.encode("utf-8") this.encode("utf-8") particular.encode("utf-8") sort..encode("utf-8")
Not the answer to his question, but if anyone is interested: import sys,codecs; sys.stdout = codecs.getwriter('utf8')(sys.stdout)
Thanks. I run with PYTHONIOENCODING set to utf8 myself, but most people seem to accept Python’s heisencoding strategy. It weirds me out too much.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.