6

Possible Duplicate:
Python, Unicode, and the Windows console

I read some strings from file and when I try to print these utf-8 strings in windows console, I get error

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc4 in position 0: ordinal not in range(128)

I've tried to set console-encoding to utf-8 with "chcp 65001" But than I than get this error message

LookupError: unknown encoding: cp65001
5
  • 6
    Fixed in Python 3.3. Commented Apr 25, 2012 at 18:31
  • 1
    is there some workaround for python 2.7? Commented Apr 25, 2012 at 18:37
  • This question has come up a few times. Here's one example with a workaround that may or may not work: stackoverflow.com/questions/5419/… Commented Apr 25, 2012 at 18:42
  • 1
    Check this out: stackoverflow.com/questions/878972/… Commented Apr 26, 2012 at 10:55
  • 1
    Daira Hopwood's answer on 878972 is the answer. On 2.7 it's wontfix (because it seems an even bigger PITA to backport it than to backport it to 3.1), in 3.3 it's added but it is still buggy, even in 3.5, due to Microsoft strangeness, and it depends on the current font, not just chcp 65001. BTW. the relevant environment variable is setting is SET PYTHONIOENCODING=utf-8 (you can try mbcs, too), but neither will work, because cp65001 is buggy, and the winapi is buggy (I am not sure what 'mbcs' supposed to do but it won't help). Commented Sep 8, 2015 at 10:07

2 Answers 2

3

I recommend you to check similar questions on stackoverflow, there are many of them.

Anyway, you can do it this way:

  1. read from file in any encoding (for example utf8) but decode strings to unicode
  2. for windows console, output unicode strings. You don't need to encode in this special case. You don't need to set the console encoding, output text will be correctly encoded automatically.

For files, you need to use codecs module or to encode in proper encoding.

Sign up to request clarification or add additional context in comments.

4 Comments

Good advice, but it should be noted that if you're expecting multiple language support on the console this won't provide it.
did this actually work for you? I get LookupError: unknown encoding: cp65001 even before I read the first byte from the file. It seems totally unrelated to string contents. It is as if Python would lack the understanding of cp65001 but try that way nevertheless, and this will never work unless you work around it or use python 3.3, if I had to guess.
@naxa Yeah, python does not understand cp65001. Do not chcp to 65001. Or at least do use set PYTHONIOENCODING=utf-8 before calling python. See also stackoverflow.com/questions/878972/…
thanks, I found it meanwhile, the basic idea is to use python 3.3+, and expect it to still be buggy, and on 2.7 it's wontfix, and set your font to "DejaVu Sans Mono" or equivalent in the console. :)
2

The print command tries to convert Unicode strings to the encoding supported by the console. Try:

>>> import sys
>>> sys.stdout.encoding
'cp852'

It shows you what encoding the console supports (what is told to Python to be supported). If the character cannot be converted to that encoding, there is no way to display it correctly.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.