UnicodeDecodeError when ssh from OS X

Question

My Django app loads some files on startup (or when I execute management command). When I ssh from one of my Arch or Ubuntu machines all works fine, I am able to successfully run any commands and migrations.

But when I ssh from OS X (I have El Capital) and try to do same things I get this error:

UnicodeDecodeError: 'ASCII' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)

To open my files I use with open(path_to_file) as f: ...

The error happens when sshing from both iterm and terminal. I found out that reason was LC_CTYPE environment variable. It wasn't set on my other Linux machines but on mac it was UTF-8 so after I ssh to the server it was set the same. The error was fixed after I unset LC_CTYPE.

So the actual question is what has happened and how to avoid this further? I can unset this variable in my local machine but will it take some negative effects? And what is the best way of doing this?

Prakhar Trivedi · Accepted Answer · 2016-10-14 09:04:29Z

5

Your terminal at your local machine uses a character encoding. The encoding it uses appears to be UTF-8. When you log on to your server (BTW, what OS does it run?) the programs that run there need to know what encoding your terminal supports so that they display stuff as needed. They get this information from LC_CTYPE. ssh correctly sets it to UTF-8, because that's what your terminal supports.

When you unset LC_CTYPE, then your programs use the default, ASCII. The programs now display in ASCII instead of UTF-8, which works because UTF-8 is backward compatible with ASCII. However, if a program needs to display a special character that does not exist in ASCII, it won't work.

Although from the information you give it's not entirely clear to me why the system behaves in this way, I can tell you that unsetting LC_CTYPE is a bad workaround. To avoid problems in the future, it would be better to make sure that all your terminals in all your machines use UTF-8, and get rid of ASCII.

When you try to open a file, Python uses the terminal's (i.e. LC_CTYPE's) character set. I've never quite understood why it's made this way; why should the character set of your terminal indicate the encoding a file has? However, that's the way it's made and the way to fix the problem correctly is to use the encoding parameter of open if you are using Python 3, or the codecs standard library module if you are using Python 2.

edited Oct 14, 2016 at 9:04

Prakhar Trivedi

8,5563 gold badges30 silver badges36 bronze badges

answered Oct 14, 2016 at 7:51

Antonis Christofides

7,0483 gold badges49 silver badges67 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

valignatev Over a year ago

Actually I just googled that UTF-8 isn't valid LC_CTYPE so I added instead export LC_ALL=en_US.UTF-8 and export LANG=en_US.UTF-8 to my .bash_profile and it fixed the issue. Btw I use Ubuntu 16.04 as a server machine. And I didn't know open uses LC_CTYPE for defining files encoding, it's weird indeed. Thanks for that. I'll accept your answer.

Francesca · Accepted Answer · 2018-04-07 10:30:32Z

1

I had a similar issue after updating my OS-X, ssh-ing to a UNIX server the copyright character was not encoded cause the UTF-8 locale was not properly set up. I solved the issue unchecking the setting "Set locale environment variables on startup" in the preferences of my terminal(s).

answered Apr 7, 2018 at 10:30

Francesca

111 bronze badge

Collectives™ on Stack Overflow

UnicodeDecodeError when ssh from OS X

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related