4

My Django app loads some files on startup (or when I execute management command). When I ssh from one of my Arch or Ubuntu machines all works fine, I am able to successfully run any commands and migrations.

But when I ssh from OS X (I have El Capital) and try to do same things I get this error:

UnicodeDecodeError: 'ASCII' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)

To open my files I use with open(path_to_file) as f: ...

The error happens when sshing from both iterm and terminal. I found out that reason was LC_CTYPE environment variable. It wasn't set on my other Linux machines but on mac it was UTF-8 so after I ssh to the server it was set the same. The error was fixed after I unset LC_CTYPE.

So the actual question is what has happened and how to avoid this further? I can unset this variable in my local machine but will it take some negative effects? And what is the best way of doing this?

2 Answers 2

5

Your terminal at your local machine uses a character encoding. The encoding it uses appears to be UTF-8. When you log on to your server (BTW, what OS does it run?) the programs that run there need to know what encoding your terminal supports so that they display stuff as needed. They get this information from LC_CTYPE. ssh correctly sets it to UTF-8, because that's what your terminal supports.

When you unset LC_CTYPE, then your programs use the default, ASCII. The programs now display in ASCII instead of UTF-8, which works because UTF-8 is backward compatible with ASCII. However, if a program needs to display a special character that does not exist in ASCII, it won't work.

Although from the information you give it's not entirely clear to me why the system behaves in this way, I can tell you that unsetting LC_CTYPE is a bad workaround. To avoid problems in the future, it would be better to make sure that all your terminals in all your machines use UTF-8, and get rid of ASCII.

When you try to open a file, Python uses the terminal's (i.e. LC_CTYPE's) character set. I've never quite understood why it's made this way; why should the character set of your terminal indicate the encoding a file has? However, that's the way it's made and the way to fix the problem correctly is to use the encoding parameter of open if you are using Python 3, or the codecs standard library module if you are using Python 2.

Sign up to request clarification or add additional context in comments.

1 Comment

Actually I just googled that UTF-8 isn't valid LC_CTYPE so I added instead export LC_ALL=en_US.UTF-8 and export LANG=en_US.UTF-8 to my .bash_profile and it fixed the issue. Btw I use Ubuntu 16.04 as a server machine. And I didn't know open uses LC_CTYPE for defining files encoding, it's weird indeed. Thanks for that. I'll accept your answer.
1

I had a similar issue after updating my OS-X, ssh-ing to a UNIX server the copyright character was not encoded cause the UTF-8 locale was not properly set up. I solved the issue unchecking the setting "Set locale environment variables on startup" in the preferences of my terminal(s).

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.