1

For my project, everything must be in unicode. Here is my way of handling everything, all strings are passed into this function:

def unicodify(string):
    if not isinstance(string, unicode):
        return string.decode('utf8', errors='ignore')
    return string

Is the following method good practice for production code? If not, why and how would you suggest decoding to unicode? The errors='ignore' actually does not work for ValueErrors 'invalid \x escape', but i'm not sure how to properly handle that.

Thanks

4
  • 1
    What version of Python do you use, 2 or 3? Commented Sep 13, 2013 at 7:27
  • 3
    @Tichodroma: 2, by the looks of it; Python 3 strings do not have a .decode() method. Commented Sep 13, 2013 at 7:28
  • 3
    The exception you see means that the input string is not UTF8 encoded. Commented Sep 13, 2013 at 7:29
  • Well thanks so far you guys but it's still not clear to me what to do. All my strings final output destination should be in unicode. Should I: Catch the ValueErrors and modify/skip the strings entirely? Or go into the strings and remove the \x value errors manually? Try to decode into some other encoding? Thanks Commented Sep 13, 2013 at 16:10

2 Answers 2

1

You may have invalid string literal.

\x should be followed by two hex values(digits, A, B, C, D, E, F, a, b, c, d, e, f).

Valid example:

>>> '\xA9'
'\xa9'
>>> '\x00'
'\x00'
>>> '\xfF'
'\xff'

Invalid example:

>>> '\xOO'
ValueError: invalid \x escape
>>> '\xl3'
ValueError: invalid \x escape
>>> '\x5'
ValueError: invalid \x escape

See String literals.

Sign up to request clarification or add additional context in comments.

Comments

0

For you to even attempt to convert str type to unicode type you need to know the encoding of the data in str. utf8 is common, but not the only encoding out there.

Additionally, you could get str data that is not in any encoding (like arbitrary binary data). In that case you can not convert it to unicode. Or rather, you have two options: a) raise an exception or b) convert as much as you can and ignore errors. It depends on the application what you should do.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.