3

My question is actually the question below:

python input UnicodeDecodeError:

in s = input("Enter a name:"), if the user enters a string in Unicode like علی and then press backspace and then enter (I mean using backspace to edit the input), it throws UnicodeDecodeError:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xdb in position 4: unexpected end of data

The accepted answer doesn't show any specific solution and relates it to the terminal and server encoding. I sat the LC_ALL=en_US.UTF-8 but it didn't solve the problem, the terminal also is in UTF-8. My PC OS is Ubuntu 20.04 and the server is ubuntu 16.04. The program is running on the server and I am connected to it via ssh.

it's the output of locale on the server:

LANG=en_US.UTF-8
LANGUAGE=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=en_US.UTF-8

My python version on the server is 3.7.10. I tested this on my laptop with python 3.8.5, and hadn't a problem. Then I installed python 3.8.8 on server using conda, the problem persists. Also this problem isn't in the Python console, but when I call "python or python3.8 inp.py", when the input command is in a file named inp.py.

8
  • you press backspace before 'enter'? for me it does not reproduce on Python 3.5.3 Windows 10 Commented Apr 27, 2021 at 17:11
  • @Orb yes I hit backspace before enter. Can you write something unicode? like in a foreign script, .. Commented Apr 27, 2021 at 17:14
  • 1
    so that's what I did. on Windows it works. so it might be an OS compatibility issue. what Python version are you using (update your question). i know the input function changed from python 2 to python 3. Commented Apr 27, 2021 at 19:01
  • There are two consecutive Zero Width Joiner characters in your question; couldn't this be a culprit? Copy like ‍‍علی and and paste in Python prompt (between a pair of quotes); you get 'like \u200d\u200dعلی and'. Commented Apr 27, 2021 at 19:10
  • Please mention the Python version you're using. Commented Apr 27, 2021 at 20:22

1 Answer 1

0

I'm facing the same issue. I'm reading with: i=input(k + ': ')

I get a UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc3 in position 4: invalid continuation byte error (position varies), whenever I the input contains a letter with an accent, which is then deleted with backspace (even if I type additional characters after backspace, but not if I delete back an additional non-accent character). The deleted letter does not have to be the last one.

Interestingly I get this error only when the script is executed via jupyter notebook's terminal (having the same LANG and LC_* env variables set like in gnome terminal). If I run the same script in gnome terminal, I can't reproduce the problem.

So it seems like that in jupyter terminal backspace deletes only one byte of a 2 byte unicode character

Python versions:

  • promlematic setup: jupyter: 3.11.7
  • working setup: ubuntu 22.04: 3.10.12

Ps: i've tested now and if I use cat > /tmp/testfile and type in a character á, which I delete then back with backspace and then I close the file (Ctrl + d), then in ubuntu I end up with an empty file, while in jupyter with a singe 0xc3 byte in the file, which confirms my hypothesis, so I guess this might be some terminal emulator or maybe browser issue.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.