python: unicode in Windows terminal, encoding used?

Question

I am using the Python interpreter in Windows 7 terminal.
I am trying to wrap my head around unicode and encodings.

I type:

>>> s='ë'
>>> s
'\x89'
>>> u=u'ë'
>>> u
u'\xeb'

Question 1: Why is the encoding used in the string s different from the one used in the unicode string u?

I continue, and type:

>>> us=unicode(s)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0x89 in position 0: ordinal
not in range(128)
>>> us=unicode(s, 'latin-1')
>>> us
u'\x89'

Question2: I tried using the latin-1 encoding on good luck to turn the string into an unicode string (actually, I tried a bunch of other ones first, including utf-8). How can I find out which encoding the terminal has used to encode my string?

Question 3: how can I make the terminal print ë as ë instead of '\x89' or u'xeb'? Hmm, stupid me. print(s) does the job.

I already looked at this related SO question, but no clues from there: Set Python terminal encoding on Windows

In the first question, you're talking about representation, not encoding.s is an object that contains a single byte with a specific value; u is an object containing a single character. In both cases you see the repr of the object reported back. — Karl Knechtel
– Karl Knechtel, Commented May 4, 2024 at 21:34

Mark Tolonen · Accepted Answer · 2018-03-07 07:00:46Z

14

Unicode is not an encoding. You encode into byte strings and decode into Unicode:

>>> '\x89'.decode('cp437')
u'\xeb'
>>> u'\xeb'.encode('cp437')
'\x89'
>>> u'\xeb'.encode('utf8')
'\xc3\xab'

The windows terminal uses legacy code pages for DOS. For US Windows it is:

>>> import sys
>>> sys.stdout.encoding
'cp437'

Windows applications use windows code pages. Python's IDLE will show the windows encoding:

>>> import sys
>>> sys.stdout.encoding
'cp1252'

Your results may vary.

edited Mar 7, 2018 at 7:00

answered Jun 14, 2011 at 20:05

Mark Tolonen

181k26 gold badges183 silver badges279 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Rabarberski Over a year ago

Thanks for the sys.stdout.encoding tip. Now it is clear to me how I can determine the encoding used in the terminal

Cameron Lowell Palmer · Accepted Answer · 2016-05-16 18:34:22Z

7

Avoid Windows Terminal

I'm not going out on a limb by saying the 'terminal' more appropriately the 'DOS prompt' that ships with Windows 7 is absolute junk. It was bad in Windows 95, NT, XP, Vista, and 7. Maybe they fixed it with Powershell, I don't know. However, it is indicative of the kind of problems that were plaguing OS development at Microsoft at the time.

Output to a file instead

Set the PYTHONIOENCODING environment variable and then redirect the output to a file.

set PYTHONIOENCODING=utf-8

./myscript.py > output.txt

Then using Notepad++ you can then see the UTF-8 version of your output.

Install win-unicode-console

win-unicode-console can fix your problems. You should try it out

pip install win-unicode-console

If you are interested in a through discussion on the issue of python and command-line output check out Python issue 1602. Otherwise, just use the win-unicode-console package.

py -m run script.py

Runs it per script or you can follow their directions to add win_unicode_console.enable() to every invocation by adding it to usercustomize or sitecustomize.

answered May 16, 2016 at 18:34

Cameron Lowell Palmer

22.4k7 gold badges130 silver badges135 bronze badges

2 Comments

Mark Ransom Over a year ago

Starting with Python 3.6 the Windows console is much more usable. Outputting to the console bypasses the code page nonsense entirely and works directly with Unicode.

Cameron Lowell Palmer Over a year ago

That's good to know. Next time I'm doing some python development for a Windows shop I'll try and push them forward

Community · Accepted Answer · 2017-05-23 12:32:52Z

2

In case others get this page when searching Easiest way is to set the codepage in the terminal first