4

I know that Python strings are immutable, which means that

letters = "world"
letters += "sth"

would give me a different string object after the concatenation

begin: id(letters): 1828275686960
end: id(letters): 1828278265776

However, when I run a for-loop to append to a string, it turns out that the string object remain unchanged during the for-loop:

letters = "helloworld"
print("before for-loop:")
print(id(letters))
print("in for-loop")

for i in range(5):
    letters += str(i)
    print(id(letters))

The output:

before for-loop:
2101555236144
in for-loop
2101557044464
2101557044464
2101557044464
2101557044464
2101557044464

Apparently the underlying string object that letter points to did not change during the for-loop, which contradicts the concept that string should be immutable.

Is this some kind of optimization that Python performs under the hood?

8
  • Why do you print letters.__repr__ and not id(letters) as above? Commented Oct 13, 2021 at 13:18
  • += is invoking __iadd__ method. We must take a look at that method. Commented Oct 13, 2021 at 13:22
  • 6
    Note that the output does change at some point if you increase 5 to a larger number. Commented Oct 13, 2021 at 13:25
  • Note that the output is also the same if you unroll the loop manually. Commented Oct 13, 2021 at 13:26
  • 3
    This seems to be a optimization in CPython. Does this make sense web.eecs.utk.edu/~azh/blog/pythonstringsaremutable.html ? Commented Oct 13, 2021 at 13:37

1 Answer 1

4

From the documentation:

id(object)

Return the “identity” of an object. This is an integer which is guaranteed to be unique and constant for this object during its lifetime. Two objects with non-overlapping lifetimes may have the same id() value.

CPython implementation detail: This is the address of the object in memory.

The method id() is, in this case, the memory address of the stored string as the source code shows us:

static PyObject *
builtin_id(PyModuleDef *self, PyObject *v)
/*[clinic end generated code: output=0aa640785f697f65 input=5a534136419631f4]*/
{
    PyObject *id = PyLong_FromVoidPtr(v);

    if (id && PySys_Audit("builtins.id", "O", id) < 0) {
        Py_DECREF(id);
        return NULL;
    }

    return id;
} 

What happens is that the end and begin of life of the two objects do indeed overlap. Python guarantees the immutability of strings only as long as they are alive. As the article suggested by @kris shows:

import _ctypes
    
a = "abcd"
a += "e"

before_f_id = id(a)

a += "f"

print(a)
print( _ctypes.PyObj_FromPtr(before_f_id) ) # prints: "abcdef"

the string a ended is life and it is not guaranteed to be retrievable given is memory location, in fact the above example shows that it is reused for the new variable.

We can take a look at how it is implemented under the hood in the unicode_concatenate method looking at the last lines of codes:

res = v;
PyUnicode_Append(&res, w);
return res;

where v and w are those in the expression: v += w

The method PyUnicode_Append is in fact trying to reuse the same memory location for the new object, in detail in PyUnicode_Append:

PyUnicode_Append(PyObject **p_left, PyObject *right):

...

new_len = left_len + right_len;

if (unicode_modifiable(left)
    ...
{
    /* append inplace */
    if (unicode_resize(p_left, new_len) != 0)
        goto error;

    /* copy 'right' into the newly allocated area of 'left' */
    _PyUnicode_FastCopyCharacters(*p_left, left_len, right, 0, right_len);
}
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.