Confused about `is` operator with strings

Question

The is operator compares the memory addresses of two objects, and returns True if they're the same. Why, then, does it not work reliably with strings? Code #1

>>> a = "poi"
>>> b = "poi"
>>> a is b
True

Code #2

>>> ktr = "today is a fine day"
>>> ptr = "today is a fine day"
>>> ktr is ptr
False

I have created two strings whose content is the same but they are living on different memory addresses. Why is the output of the is operator not consistent?

In practice you will likely never use is unless you are doing something fairly complicated. You probably want == for most equality comparisons. — dkamins
– dkamins, Commented Oct 25, 2012 at 5:38
@dkamins: Well, the recommended way for testing for None uses 'is': some_var is None (because there's always only a single instance of None). I'd say that's a pretty common case. — voithos
– voithos, Commented Oct 25, 2012 at 5:42
@voithos True - I use that (and is not None) all the time and didn't even think of it! But aside from that... — dkamins
– dkamins, Commented Oct 25, 2012 at 5:44
@dkamins It's also commonly used for True and False, as they're also singletons. — agf
– agf, Commented Oct 25, 2012 at 5:56
@voithos Indeed, however I rarely find myself using those idioms. if something or if not something usually reads clearer to me than if something is True or if something is False. — dkamins
– dkamins, Commented Oct 26, 2012 at 1:17

voithos · Accepted Answer · 2012-10-25 06:01:37Z

6

I believe it has to do with string interning. In essence, the idea is to store only a single copy of each distinct string, to increase performance on some operations.

Basically, the reason why a is b works is because (as you may have guessed) there is a single immutable string that is referenced by Python in both cases. When a string is large (and some other factors that I don't understand, most likely), this isn't done, which is why your second example returns False.

EDIT: And in fact, the odd behavior seems to be a side-effect of the interactive environment. If you take your same code and place it into a Python script, both a is b and ktr is ptr return True.

a="poi"
b="poi"
print a is b  # Prints 'True'

ktr = "today is a fine day"
ptr = "today is a fine day"
print ktr is ptr  # Prints 'True'

This makes sense, since it'd be easy for Python to parse a source file and look for duplicate string literals within it. If you create the strings dynamically, then it behaves differently even in a script.

a="p" + "oi"
b="po" + "i"
print a is b  # Oddly enough, prints 'True'

ktr = "today is" + " a fine day"
ptr = "today is a f" + "ine day"
print ktr is ptr  # Prints 'False'

As for why a is b still results in True, perhaps the allocated string is small enough to warrant a quick search through the interned collection, whereas the other one is not?

edited Oct 25, 2012 at 6:01

answered Oct 25, 2012 at 5:38

voithos

70.9k12 gold badges107 silver badges120 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Eryk Sun Over a year ago

Regarding a = "p" + "oi"; b = "po" + "i", the compiler optimizes the simple string concatenation by storing two references to "poi" in the code object, which are interned in place. "today is a fine day" isn't interned because it contains spaces (i.e. non-name characters).

Community · Accepted Answer · 2017-05-23 12:21:18Z

3

is is identity testing. It will work on ~~smaller~~ some strings(because of cache) but not on ~~bigger~~ other strings. Since str is NOT a ptr. [thanks erykson]

See this code:

>>> import dis
>>> def fun():
...   str = 'today is a fine day'
...   ptr = 'today is a fine day'
...   return (str is ptr)
...
>>> dis.dis(fun)
  2           0 LOAD_CONST               1 ('today is a fine day')
              3 STORE_FAST               0 (str)

  3           6 LOAD_CONST               1 ('today is a fine day')
              9 STORE_FAST               1 (ptr)

  4          12 LOAD_FAST                0 (str)
             15 LOAD_FAST                1 (ptr)
             18 COMPARE_OP               8 (is)
             21 RETURN_VALUE

>>> id(str)
26652288
>>> id(ptr)
27604736
#hence this comparison returns false: ptr is str

Notice the IDs of str and ptr are different.

BUT:

>>> x = "poi"
>>> y = "poi"
>>> id(x)
26650592
>>> id(y)
26650592
#hence this comparison returns true : x is y

IDs of x and y are the same. Hence is operator works on "ids" and not on "equalities"

See the below link for a discussion on when and why python will allocate a different memory location for identical strings(read the question as well).

When does python allocate new memory for identical strings

Also sys.intern on python3.x and intern on python2.x should help you allocate the strings in the same memory location, regardless of the size of the string.

edited May 23, 2017 at 12:21

CommunityBot

11 silver badge

answered Oct 25, 2012 at 5:33

Aniket Inge

25.9k5 gold badges54 silver badges80 bronze badges

8 Comments

navyad Over a year ago

how small a string can be for that matter/

Matthew Adams Over a year ago

You can have an empty string "". This is not the whole story though, because two strings can be set to the same value, so that is returns true, and then you can modify one and undo the modification and the result won't be true any more.

agf Over a year ago

You can certainly have larger strings than this interned. While you're right that the short string is and the larger string isn't, "it works with shorter strings" isn't the whole answer.

Eryk Sun Over a year ago

It has nothing to do with the size of the string. If a string constant (not from an expression) is all "name characters", it will be interned when the code object is created. Plus implementation strings used for variable names, attributes, etc.

Eryk Sun Over a year ago

See here and here in codeobject.c (2.7.3 source).

|

Matthew Adams · Accepted Answer · 2012-10-25 05:53:12Z

2

is is not the same as ==.

Basically, is checks if the two objects are the same, while == compares the values of those objects (strings, like everything in python, are objects).

So you should use is when you really know what objects you're looking at (ie. you've made the objects, or are comparing with None as the question comments point out), and you want to know if two variables are referencing the exact same object in memory.

In your examples, however, you're looking at str objects that python is handling behind the scenes, so without diving deep into how python works, you don't really know what to expect. You would have the same problem with ints or floats. Other answers do a good job of explaining the "behind the scenes" stuff (string interning), but you mostly shouldn't have to worry about it in day-to-day programming.

edited Oct 25, 2012 at 5:53

answered Oct 25, 2012 at 5:34

Matthew Adams

10.3k3 gold badges31 silver badges43 bronze badges

2 Comments

agf Over a year ago

So why are the two examples different? This doesn't explain.

Matthew Adams Over a year ago

@agf just edited- Does that explain what I meant my answer to be better?

asmeurer · Accepted Answer · 2012-10-25 06:26:01Z

1

Note that this is a CPython specific optimization. If you want your code to be portable, you should avoid it. For example, in PyPy

>>>> a = "hi"
>>>> b = "hi"
>>>> a is b
False

It's also worth pointing out that a similar thing happens for small integers

>>> a = 12
>>> b = 12
>>> a is b
True

which again you should not rely on, because other implementations might not include this optimization.

answered Oct 25, 2012 at 6:26

asmeurer

92.9k29 gold badges182 silver badges257 bronze badges

Collectives™ on Stack Overflow

Confused about `is` operator with strings

4 Answers 4

1 Comment

8 Comments

2 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

1 Comment

8 Comments

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related