Python: Emit some Utf-8 string to windows console [duplicate]

Question

Possible Duplicate:
Python, Unicode, and the Windows console

I read some strings from file and when I try to print these utf-8 strings in windows console, I get error

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc4 in position 0: ordinal not in range(128)

I've tried to set console-encoding to utf-8 with "chcp 65001" But than I than get this error message

LookupError: unknown encoding: cp65001

This question has come up a few times. Here's one example with a workaround that may or may not work: stackoverflow.com/questions/5419/… — Mark Ransom
– Mark Ransom, Commented Apr 25, 2012 at 18:42
Daira Hopwood's answer on 878972 is the answer. On 2.7 it's wontfix (because it seems an even bigger PITA to backport it than to backport it to 3.1), in 3.3 it's added but it is still buggy, even in 3.5, due to Microsoft strangeness, and it depends on the current font, not just chcp 65001. BTW. the relevant environment variable is setting is SET PYTHONIOENCODING=utf-8 (you can try mbcs, too), but neither will work, because cp65001 is buggy, and the winapi is buggy (I am not sure what 'mbcs' supposed to do but it won't help). — n611x007
– n611x007, Commented Sep 8, 2015 at 10:07

Jiri · Accepted Answer · 2012-04-30 14:05:02Z

3

I recommend you to check similar questions on stackoverflow, there are many of them.

Anyway, you can do it this way:

read from file in any encoding (for example utf8) but decode strings to unicode
for windows console, output unicode strings. You don't need to encode in this special case. You don't need to set the console encoding, output text will be correctly encoded automatically.

For files, you need to use codecs module or to encode in proper encoding.

edited Apr 30, 2012 at 14:05

answered Apr 30, 2012 at 13:57

Jiri

16.6k7 gold badges56 silver badges68 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Mark Ransom Over a year ago

Good advice, but it should be noted that if you're expecting multiple language support on the console this won't provide it.

n611x007 Over a year ago

did this actually work for you? I get LookupError: unknown encoding: cp65001 even before I read the first byte from the file. It seems totally unrelated to string contents. It is as if Python would lack the understanding of cp65001 but try that way nevertheless, and this will never work unless you work around it or use python 3.3, if I had to guess.

Jiri Over a year ago

@naxa Yeah, python does not understand cp65001. Do not chcp to 65001. Or at least do use set PYTHONIOENCODING=utf-8 before calling python. See also stackoverflow.com/questions/878972/…

n611x007 Over a year ago

thanks, I found it meanwhile, the basic idea is to use python 3.3+, and expect it to still be buggy, and on 2.7 it's wontfix, and set your font to "DejaVu Sans Mono" or equivalent in the console. :)

pepr · Accepted Answer · 2012-04-30 14:28:40Z

2

The print command tries to convert Unicode strings to the encoding supported by the console. Try:

>>> import sys
>>> sys.stdout.encoding
'cp852'

It shows you what encoding the console supports (what is told to Python to be supported). If the character cannot be converted to that encoding, there is no way to display it correctly.

answered Apr 30, 2012 at 14:28

pepr

21.1k15 gold badges83 silver badges148 bronze badges

Collectives™ on Stack Overflow

Python: Emit some Utf-8 string to windows console [duplicate]

2 Answers 2

4 Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

Comments

Linked

Related