How to make unicode string with python3

Question

I used this :

u = unicode(text, 'utf-8')

But getting error with Python 3 (or... maybe I just forgot to include something) :

NameError: global name 'unicode' is not defined

Thank you.

If there's an awesome reason to upgrade to python 3 it is unicode by default. — JBernardo
– JBernardo, Commented Jul 25, 2011 at 5:49

IanS · Accepted Answer · 2019-03-07 10:20:40Z

167

Literal strings are unicode by default in Python3.

Assuming that text is a bytes object, just use text.decode('utf-8')

unicode of Python2 is equivalent to str in Python3, so you can also write:

str(text, 'utf-8')

if you prefer.

edited Mar 7, 2019 at 10:20

IanS

16.3k9 gold badges64 silver badges87 bronze badges

answered Jul 25, 2011 at 5:21

John La Rooy

306k54 gold badges378 silver badges514 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Gank Over a year ago

TypeError: decoding str is not supported

John La Rooy Over a year ago

@Gank, In Python3 a str is unicode, ie. it is "decoded" so it makes no sense to call decode on it

Simon Over a year ago

Same TypeError. Please just replace with str(txt), or the code from @magicrebirth below

killua8p Over a year ago

The original sample is not clear. So in python3, if you want to do str(text, 'utf-8'), text must be a string binary. e.g. str(b'this is a binary', 'utf-8')

Tremmors · Accepted Answer · 2011-07-25 05:25:09Z

12

What's new in Python 3.0 says:

All text is Unicode; however encoded Unicode is represented as binary data

If you want to ensure you are outputting utf-8, here's an example from this page on unicode in 3.0:

b'\x80abc'.decode("utf-8", "strict")

answered Jul 25, 2011 at 5:25

Tremmors

2,98619 silver badges13 bronze badges

1 Comment

http8086 Over a year ago

this is exactly what we need for '\x80abc'.decode("utf-8", "strict") in Python 2, thanks

magicrebirth · Accepted Answer · 2016-07-06 16:06:24Z

9

As a workaround, I've been using this:

# Fix Python 2.x.
try:
    UNICODE_EXISTS = bool(type(unicode))
except NameError:
    unicode = lambda s: str(s)

answered Jul 6, 2016 at 16:06

magicrebirth

4,2642 gold badges28 silver badges22 bronze badges

3 Comments

nicbou Over a year ago

Why are you using a lambda function? These methods are called the same way in any case. This is a simpler variation: try: unicode = str; except: pass.

Nickolai Over a year ago

It seems like you can just do unicode = str since it won't fail in either 2 or 3

Nickolai Over a year ago

Or from six import u as unicode which I'd prefer simply because it's more self-documenting (since six is a 2/3 compatibility layer) than unicode = str

Oskar Austegard · Accepted Answer · 2022-02-17 16:48:30Z

5

This how I solved my problem to convert chars like \uFE0F, \u000A, etc. And also emojis that encoded with 16 bytes.

example = 'raw vegan chocolate cocoa pie w chocolate &amp; vanilla cream\\uD83D\\uDE0D\\uD83D\\uDE0D\\u2764\\uFE0F Present Moment Caf\\u00E8 in St.Augustine\\u2764\\uFE0F\\u2764\\uFE0F '
import codecs
new_str = codecs.unicode_escape_decode(example)[0]
print(new_str)
>>> 'raw vegan chocolate cocoa pie w chocolate &amp; vanilla cream\ud83d\ude0d\ud83d\ude0d❤️ Present Moment Cafè in St.Augustine❤️❤️ '
new_new_str = new_str.encode('utf-16', errors='surrogatepass').decode('utf-16')
print(new_new_str)
>>> 'raw vegan chocolate cocoa pie w chocolate &amp; vanilla cream😍😍❤️ Present Moment Cafè in St.Augustine❤️❤️ '

edited Feb 17, 2022 at 16:48

Oskar Austegard

4,6594 gold badges39 silver badges50 bronze badges

answered Aug 26, 2019 at 15:22

Ilyas

2,19617 silver badges9 bronze badges

1 Comment

Oskar Austegard Over a year ago

For more on 'surrogatepass' see docs.python.org/3/library/codecs.html#error-handlers

cezar · Accepted Answer · 2019-05-10 10:02:35Z

-1

In a Python 2 program that I used for many years there was this line:

ocd[i].namn=unicode(a[:b], 'utf-8')

This did not work in Python 3.

However, the program turned out to work with:

ocd[i].namn=a[:b]

I don't remember why I put unicode there in the first place, but I think it was because the name can contains Swedish letters åäöÅÄÖ. But even they work without "unicode".

edited May 10, 2019 at 10:02

cezar

12.1k6 gold badges51 silver badges92 bronze badges

answered Mar 28, 2019 at 12:15

Per Persson

1851 gold badge1 silver badge7 bronze badges

Collectives™ on Stack Overflow

How to make unicode string with python3

5 Answers 5

4 Comments

1 Comment

3 Comments

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

4 Comments

1 Comment

3 Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related