27

I read in a comment here on Stack Overflow that it is more memory efficient to do slice assignment when changing lists. For example,

a[:] = [i + 6 for i in a]

should be more memory efficient than

a = [i + 6 for i in a]

because the former replaces elements in the existing list, while the latter creates a new list and rebinds a to that new list, leaving the old a in memory until it can be garbage collected. Benchmarking the two for speed, the latter is slightly quicker:

$ python -mtimeit -s 'a = [1, 2, 3]' 'a[:] = [i + 6 for i in a]'
1000000 loops, best of 3: 1.53 usec per loop
$ python -mtimeit -s 'a = [1, 2, 3]' 'a = [i + 6 for i in a]'
1000000 loops, best of 3: 1.37 usec per loop

That is what I'd expect, as rebinding a variable should be faster than replacing elements in a list. However, I can't find any official documentation which supports the memory usage claim, and I'm not sure how to benchmark that.

On the face of it, the memory usage claim makes sense to me. However, giving it some more thought, I would expect that in the former method, the interpreter would create a new list from the list comprehension and then copy the values from that list to a, leaving the anonymous list in floating around until it is garbage collected. If that's the case, then the former method would use the same amount of memory while also being slower.

Can anyone show definitively (with a benchmark or official documentation) which of the two methods is more memory efficient/which is the preferred method?

Thanks in advance.

2
  • 2
    The performance aspects might be worth considering, but I think you are more likely to run into the practical case (in larger programs) where you pass a reference to a list, say, from Class1 to Class2. In the first instance, using slice assignment to modify Class1's list will preserve Class2's reference. In the second instance you cite, modifying Class1's list means that Class2 will be holding a reference to a list that is no longer valid. Commented Feb 9, 2011 at 17:59
  • @Brandon: That is also true, and I probably should have mentioned the distinction in my question. Thanks for your input. Commented Feb 9, 2011 at 18:03

1 Answer 1

53

The line

a[:] = [i + 6 for i in a]

would not save any memory. Python does evaluate the right hand side first, as stated in the language documentation:

An assignment statement evaluates the expression list (remember that this can be a single expression or a comma-separated list, the latter yielding a tuple) and assigns the single resulting object to each of the target lists, from left to right.

In the case at hand, the single resulting object would be a new list, and the single target in the target list would be a[:].

We could replace the list comprehension by a generator expression:

a[:] = (i + 6 for i in a)

Now, the right hand side evaluates to a generator instead of a list. Benchmarking shows that this is still slower than the naive

a = [i + 6 for i in a]

So does the generator expression actually save any memory? At first glance, you might think it does. But delving in to the source code of the function list_ass_slice() shows that it does not. The line

v_as_SF = PySequence_Fast(v, "can only assign an iterable");

uses PySequence_Fast() to convert the iterable (in this case the generator) into a tuple first, which is then copied into the old list. A tuple uses the same amount of memory as a list, so using a generator expression is basically the same as using a list comprehension in this case. During the last copy, the items of the original list are reused.

The moral seems to be that in this case the simplest approach is best in every regard.

Sign up to request clarification or add additional context in comments.

5 Comments

+1 for merciless crushing of the premature (memory) optimizers.
Thanks for the detailed and insightful answer! In response to the commenter above, I'd like to add that this might not have been premature optimization if you were dealing with a list of 5 million elements and had a choice between copying it and not copying it. :)
@Mitch: if you have 5 million entries, you are probably better off with e.g. a NumPy array than a Python list.
Looking in the mechanisms under the hood makes this answer a very good and definitive one. +1
I think it is somewhat reassuring that a[:] = (x + 6 for x in a) doesn't save any memory, and that the generator is consumed before any item is assigned to a[:]. Otherwise, you'd be iterating on a while modifying a at the same time, and it could easily get confusing. Consider a[::-1] = (x + 6 for x in a). At a glance it looks like it should reverse a and add 6 to every item. And indeed, it does. But if the generator was consumed lazily during the assignment, then it'd be equivalent to for i,x in enumerate(a): a[~i] = x + 6, whose exact behaviour is hard to understand at a glance.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.