Why do I get a ASCII encoding error with Unicode data in Python 2.4 but not in 2.7?

Question

I have a program that, when run in Python 2.7, produces proper Unicode output to the standard output. When run in Python 2.4, I get UnicodeEncodeError: 'ascii' codec can't encode characters in position 1-4: ordinal not in range(128). What changed between version 2.4 and 2.7 that this works now?

@Karl Knechtel: It just comes from a statement like: sys.stdout.write(unicode(data)) or sys.stdout.write(data). The problem is, this means that the problem is originating from somewhere else...and I have no idea where (the application is relatively large). — Keith Pinson
– Keith Pinson, Commented Aug 24, 2011 at 21:37
Try import sys; print sys.getdefaultencoding() to see if the default unicode-to-string encoding is different between the two. — Russell Borogove
– Russell Borogove, Commented Aug 24, 2011 at 22:45
@Russell Borogove: Okay, interesting, let me see...both return "asciii"! How puzzling! — Keith Pinson
– Keith Pinson, Commented Aug 24, 2011 at 22:47
Is there anything more you can tell us about the data (its source, its value) at the point of failure? — Russell Borogove
– Russell Borogove, Commented Aug 24, 2011 at 22:51

Gringo Suave · Accepted Answer · 2011-08-25 18:52:48Z

7

Although I could not find any mention of it elswhere, it appears that Python 2.7 is automatically converting text to the terminal encoding, instead of throwing an error as expected.

Python 2.7:

> echo $LANG
en_US.UTF-8
> python -c 'import sys; print sys.getdefaultencoding()'
ascii

> python -c 'import sys; sys.stdout.write(u"\u03A3")'
Σ
> python -c 'import sys; sys.stdout.write(u"\u03A3".encode("utf8"))'
Σ

Python 2.6 (on another box)

> echo $LANG
en_US.UTF-8
> python -c 'import sys; print sys.getdefaultencoding()'
ascii

> python -c 'import sys;  sys.stdout.write(u"\u03A3")'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec cant encode character u'\u03a3' in position 0: ordinal not in range(128)
> python -c 'import sys;  sys.stdout.write(u"\u03A3".encode("utf8"))'
Σ

In any case, an .encode("utf8") on the data before output should avoid the issue.

edited Aug 25, 2011 at 18:52

answered Aug 24, 2011 at 23:43

Gringo Suave

32.3k7 gold badges95 silver badges82 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

tchrist Over a year ago

Why.encode("utf-8") don’t.encode("utf-8") you.encode("utf-8") just.encode("utf-8") set.encode("utf-8") the.encode("utf-8") stream.encode("utf-8") encoding.encode("utf-8") for.encode("utf-8") stdout.encode("utf-8") to.encode("utf-8") be.encode("utf-8") UTF-8.encode("utf-8") all.encode("utf-8") the.encode("utf-8") time?.encode("utf-8") That.encode("utf-8") saves.encode("utf-8") the.encode("utf-8") ridiculous.encode("utf-8") and.encode("utf-8") massive.encode("utf-8") annoyance.encode("utf-8") of.encode("utf-8") this.encode("utf-8") particular.encode("utf-8") sort..encode("utf-8")

Gringo Suave Over a year ago

Not the answer to his question, but if anyone is interested: import sys,codecs; sys.stdout = codecs.getwriter('utf8')(sys.stdout)

tchrist Over a year ago

Thanks. I run with PYTHONIOENCODING set to utf8 myself, but most people seem to accept Python’s heisencoding strategy. It weirds me out too much.

Collectives™ on Stack Overflow

Why do I get a ASCII encoding error with Unicode data in Python 2.4 but not in 2.7?

1 Answer 1

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related