2

I need to output some strings (to stdout) and because windows console works in cp437, if the string contains any characters outside cp437, an exception is thrown.

I got around this by

encoding=sys.stdout.encoding
pathstr = path.encode(encoding,errors="replace").decode(encoding)
print(pathstr)

where path is the str i want to output. I'm fine with characters replaced by "?"

This doesn't seem good because it converts to a byte array and back to a str.

Is there a better way to achieve this?

I'm still new to python ( a week maybe ) and I'm using Win7 32 bit with cpython 3.3

3 Answers 3

3

This doesn't seem good because it converts to a byte array and back to a str.

If you want to write raw bytes to the stream, use .buffer:

pathbytes= path.encode(encoding, errors= 'replace')
sys.stdout.buffer.write(pathbytes)

...oh for the day that issue 1602 comes to something and we can avoid the Unicode horror of the Windows command prompt...

Sign up to request clarification or add additional context in comments.

Comments

1

I'm fine with characters replaced by "?"

You could set PYTHONIOENCODING environment variable:

C:\> set PYTHONIOENCODING=cp437:replace

And print Unicode strings directly:

print(path)

In that case, if you are redirecting to a file; you could set PYTHONIOENCODING to utf-8 and get the correct complete output.

You could also try WriteConsoleW()-based solutions from the corresponding Python bug and see if they work on Python 3.3 e.g.:

import _win_console
_win_console.install_unicode_console()

print("cyrillic: цык.")

Where _win_console is from win_console.patch. You don't need to set the environment variable in this case and it should work with any codepage (with an appropriate console font, it might even show characters outside the current codepage).

All solutions for the problem of printing Unicode inside the Windows console have drawbacks (see the discussion and the reference links in the bug tracker for all the gory details).

Comments

0

The best advice I ever heard about Unicode was to make a Unicode Sandwich:

  1. Immediately convert any incoming text in your program into unicode.
  2. Deal exclusively with Unicode in your program.
  3. Export to whatever serialization format you want for your output.

In this case, you're basically doing just that. In a longer program, it would make sense to do this in the manner you describe, and I think you'd feel more comfortable about it.

The only change I'd make would be to encode to utf-8, then decode to cp437 on output.

1 Comment

I can't make sense of your last sentence. You mean encode your unicode object to a UTF-8 string, and then decode that as if it were a CP437 string?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.