0

When I run

u''.startswith('x\x9c')

I end up with an exception

UnicodeDecodeError: 'ascii' codec can't decode byte 0x9c in position 1: ordinal not in range(128)

Why does 'x\x9c' get decoded as an ascii character as opposed to a unicode character as I have run it on the unicode string u''?

1 Answer 1

1

This is because python can't decode 'x\x9c' as its non-ascii character. Try this:

import unidecode
u''.startswith(unidecode.unidecode_expect_nonascii('x\x9c'))

Output: returns False As now unicode string 'x\x9c' is now represented in ASCII format by unidecode libraray function.

Also, this is happening because you tried to mix unicode and byte string. i.e if you need to check a.startswith(b) than both should be unicode or byte str. If this is not followed, you get Unicode decode error.

Hope this helps !

Sign up to request clarification or add additional context in comments.

1 Comment

But why is it trying to convert x\x9c to ascii as opposed to unicode?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.