125

I used this :

u = unicode(text, 'utf-8')

But getting error with Python 3 (or... maybe I just forgot to include something) :

NameError: global name 'unicode' is not defined

Thank you.

2
  • 19
    If there's an awesome reason to upgrade to python 3 it is unicode by default. Commented Jul 25, 2011 at 5:49
  • text.encode('unicode_escape') would be enough I guess Commented Sep 5, 2021 at 16:10

5 Answers 5

167

Literal strings are unicode by default in Python3.

Assuming that text is a bytes object, just use text.decode('utf-8')

unicode of Python2 is equivalent to str in Python3, so you can also write:

str(text, 'utf-8')

if you prefer.

Sign up to request clarification or add additional context in comments.

4 Comments

TypeError: decoding str is not supported
@Gank, In Python3 a str is unicode, ie. it is "decoded" so it makes no sense to call decode on it
Same TypeError. Please just replace with str(txt), or the code from @magicrebirth below
The original sample is not clear. So in python3, if you want to do str(text, 'utf-8'), text must be a string binary. e.g. str(b'this is a binary', 'utf-8')
12

What's new in Python 3.0 says:

All text is Unicode; however encoded Unicode is represented as binary data

If you want to ensure you are outputting utf-8, here's an example from this page on unicode in 3.0:

b'\x80abc'.decode("utf-8", "strict")

1 Comment

this is exactly what we need for '\x80abc'.decode("utf-8", "strict") in Python 2, thanks
9

As a workaround, I've been using this:

# Fix Python 2.x.
try:
    UNICODE_EXISTS = bool(type(unicode))
except NameError:
    unicode = lambda s: str(s)

3 Comments

Why are you using a lambda function? These methods are called the same way in any case. This is a simpler variation: try: unicode = str; except: pass.
It seems like you can just do unicode = str since it won't fail in either 2 or 3
Or from six import u as unicode which I'd prefer simply because it's more self-documenting (since six is a 2/3 compatibility layer) than unicode = str
5

This how I solved my problem to convert chars like \uFE0F, \u000A, etc. And also emojis that encoded with 16 bytes.

example = 'raw vegan chocolate cocoa pie w chocolate & vanilla cream\\uD83D\\uDE0D\\uD83D\\uDE0D\\u2764\\uFE0F Present Moment Caf\\u00E8 in St.Augustine\\u2764\\uFE0F\\u2764\\uFE0F '
import codecs
new_str = codecs.unicode_escape_decode(example)[0]
print(new_str)
>>> 'raw vegan chocolate cocoa pie w chocolate & vanilla cream\ud83d\ude0d\ud83d\ude0d❤️ Present Moment Cafè in St.Augustine❤️❤️ '
new_new_str = new_str.encode('utf-16', errors='surrogatepass').decode('utf-16')
print(new_new_str)
>>> 'raw vegan chocolate cocoa pie w chocolate & vanilla cream😍😍❤️ Present Moment Cafè in St.Augustine❤️❤️ '

1 Comment

-1

In a Python 2 program that I used for many years there was this line:

ocd[i].namn=unicode(a[:b], 'utf-8')

This did not work in Python 3.

However, the program turned out to work with:

ocd[i].namn=a[:b]

I don't remember why I put unicode there in the first place, but I think it was because the name can contains Swedish letters åäöÅÄÖ. But even they work without "unicode".

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.