Strange behavior when comparing unicode objects with string objects

Question

when comparing two strings in python, it works fine and when comparing a string object with a unicode object it fails as expected however when comparing a string object with a converted unicode (unicode --> str) object it fails

A Demo:

Works as expected:

>>> if 's' is 's': print "Hurrah!"
... 
Hurrah!

Pretty much yeah:

>>> if 's' is u's': print "Hurrah!"
...

Not expected:

>>> if 's' is str(u's'): print "Hurrah!"
...

Why doesn't the third example work as expected when both the type's are of the same class?

>>> type('s')
<type 'str'>

>>> type(str(u's'))
<type 'str'>

everything you ever wanted to know about string interning

shx2
– shx2

2013-12-07 07:01:13 +00:00
Commented Dec 7, 2013 at 7:01 — shx2
– shx2, Commented Dec 7, 2013 at 7:01

mgilson · Accepted Answer · 2013-12-07 07:09:46Z

Don't use is for this, use ==. You're comparing whether the objects have the same identity, not whether they are equal. Of course, if the are the same object, they will be equal (==), but if they are equal, they aren't necessarily the same object.

The fact that the first one works is an implementation detail of CPython. Small strings, since they're immutable can be interned by the interpreter. Every time you put the string "s" in your source code, Cpython reuses the same object. however, apparently str("s") returns a new string with the same value. This isn't all that surprising.

You might be asking yourself, "why intern the string 's' at all?". That's a reasonable question. After all, it's a short string -- How much memory could having multiple copies floating around in your source take? The answer (I think) is because of dictionary lookups. Since dicts with strings as keys are so common in python, you can speed up the hash function/equality checking of keys by doing lightning fast pointer comparisons (falling back on slower strcmp) when the pointer comparison returns false.

inspectorG4dget · Accepted Answer · 2013-12-07 07:08:01Z

3

The is operator is used to compare the memory location of the two operands. Since strings are immutable, 's' and 's' occupy the same location in memory.

~~Due to the way unicode is handled in python2.7, u's' and 's' are stored in the same way/place. Therefore, they occupy the same memory location. Therefore 's' is u's' evaluates to True.~~
As @mgilson points out, 's' and u's' are of different types, and therefore don't occupy the same memory location, leading to 's' is u's' evaluating to False

However, when you call str(u's'), a new string is created and returned. This new string, because it is created anew, lives in a new location in memory, which is why the is comparison fails.

What you really want is to check that they are equivalent strings, so use ==

In [1]: 's' == u's'
Out[1]: True

In [2]: 's' == 's'
Out[2]: True

In [3]: 's' == str(u's')
Out[3]: True

edited Dec 7, 2013 at 7:08

answered Dec 7, 2013 at 7:02

inspectorG4dget

115k30 gold badges159 silver badges253 bronze badges

1 Comment

mgilson Over a year ago

"s" is u"s" shouldn't evaluate to True. They're different types... (at least on python2.x -- Python 3.3, when they reintroduced the u literal I suppose you could get that check to be True...)

Steve P. · Accepted Answer · 2013-12-07 07:10:31Z

2

Use == for value comparison and is for reference comparison. If objects have the same id, it evaluates to True, otherwise as with str(), the id is altered, so you get False.

edited Dec 7, 2013 at 7:10

answered Dec 7, 2013 at 6:58

Steve P.

14.7k9 gold badges46 silver badges74 bronze badges

Collectives™ on Stack Overflow

Strange behavior when comparing unicode objects with string objects

A Demo:

3 Answers 3

Comments

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

A Demo:

3 Answers 3

Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related