6

when comparing two strings in python, it works fine and when comparing a string object with a unicode object it fails as expected however when comparing a string object with a converted unicode (unicode --> str) object it fails

A Demo:

Works as expected:

>>> if 's' is 's': print "Hurrah!"
... 
Hurrah!

Pretty much yeah:

>>> if 's' is u's': print "Hurrah!"
... 

Not expected:

>>> if 's' is str(u's'): print "Hurrah!"
... 

Why doesn't the third example work as expected when both the type's are of the same class?

>>> type('s')
<type 'str'>

>>> type(str(u's'))
<type 'str'>
1
  • 1
    everything you ever wanted to know about string interning Commented Dec 7, 2013 at 7:01

3 Answers 3

12

Don't use is for this, use ==. You're comparing whether the objects have the same identity, not whether they are equal. Of course, if the are the same object, they will be equal (==), but if they are equal, they aren't necessarily the same object.

The fact that the first one works is an implementation detail of CPython. Small strings, since they're immutable can be interned by the interpreter. Every time you put the string "s" in your source code, Cpython reuses the same object. however, apparently str("s") returns a new string with the same value. This isn't all that surprising.


You might be asking yourself, "why intern the string 's' at all?". That's a reasonable question. After all, it's a short string -- How much memory could having multiple copies floating around in your source take? The answer (I think) is because of dictionary lookups. Since dicts with strings as keys are so common in python, you can speed up the hash function/equality checking of keys by doing lightning fast pointer comparisons (falling back on slower strcmp) when the pointer comparison returns false.

Sign up to request clarification or add additional context in comments.

Comments

3

The is operator is used to compare the memory location of the two operands. Since strings are immutable, 's' and 's' occupy the same location in memory.

Due to the way unicode is handled in python2.7, u's' and 's' are stored in the same way/place. Therefore, they occupy the same memory location. Therefore 's' is u's' evaluates to True.
As @mgilson points out, 's' and u's' are of different types, and therefore don't occupy the same memory location, leading to 's' is u's' evaluating to False

However, when you call str(u's'), a new string is created and returned. This new string, because it is created anew, lives in a new location in memory, which is why the is comparison fails.

What you really want is to check that they are equivalent strings, so use ==

In [1]: 's' == u's'
Out[1]: True

In [2]: 's' == 's'
Out[2]: True

In [3]: 's' == str(u's')
Out[3]: True

1 Comment

"s" is u"s" shouldn't evaluate to True. They're different types... (at least on python2.x -- Python 3.3, when they reintroduced the u literal I suppose you could get that check to be True...)
2

Use == for value comparison and is for reference comparison. If objects have the same id, it evaluates to True, otherwise as with str(), the id is altered, so you get False.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.