9

I don't quite understand how iterators have memory in Python.

>>> l1 = [1, 2, 3, 4, 5, 6]
>>> l2 = [2, 3, 4, 5, 6, 7]
>>> iz = izip(l1, l2)

We still require O(min(l1, l2)) memory as we need to load the lists l1 and l2 in memory.

I thought one of the main uses of iterators was to save memory - yet it does not seem to be useful here.

Similarly, the code below is unclear to me:

>>> l1 = ( n for n in [1, 2, 3, 4, 5, 6] )
>>> l2 = ( n for n in [2, 3, 4, 5, 6, 7] )
>>> iz = izip(l1, l2)

We need to load the lists before converting them into generators, right? This means we'll waste memory. So - what is the point of generators here as well.

This is the only case that makes sense to me:

def build_l1():
    for n in xrange(1, 6):
       yield n

def build_l2:
    for n in xrange(2, 7):
       yield n

l1 = build_l1()
l2 = build_l2()
iz = izip(l1, l2)

None of the arrays is being loaded into memory. Hence we're in O(1) memory.

How does the memory usage of the iterator functions in Python work? The first two cases seem to use O(min(l1, l2)) memory. I thought the main point of iterators was to save memory, which makes the first two cases seem useless.

2
  • 7
    If you iterate over a list, it doesn't save memory. The point is, often you can avoid creating that list in the first place. Also, it doesn't only make sense to save memory when you can save it asymptotically. Commented Feb 1, 2016 at 13:26
  • 2
    Your build_l1 and build_l2 don't make much sense, xrange already stores just (from, to, step) Commented Feb 1, 2016 at 13:28

2 Answers 2

15

Your examples are too simplistic. Consider this:

nums = [1, 2, 3, 4, 5, 6]
nums_it = (n for n in nums)

nums_it is a generator that returns all items unmodified from nums. Clearly you do not have any advantage. But consider this:

squares_it = (n ** 2 for n in nums)

and compare it with:

squares_lst = [n ** 2 for n in nums]

With squares_it, we are generating the squares of nums on the fly only when requested. With squares_lst, we are generating all of them at once and storing them in a new list.

So, when you do:

for n in squares_it:
    print(n)

it's like if you were doing:

for n in nums:
    print(n ** 2)

But when you do:

for n in squares_lst:
    print(n)

it's like if you were doing:

squares_lst = []
for n in nums:
    squares_lst.append(n ** 2)
for n in squares_lst:
    print(n)

If you don't need (or don't have) the list nums, then you can save even more space by using:

squares_it = (n ** 2 for n in xrange(1, 7))

Generators and iterators also provide another significant advantage (which may actually be a disadvantage, depending on the situation): they are evaluated lazily.

Also, generators and iterators may yield an infinite number of elements. An example is itertools.count() that yields 0, 1, 2, 3, ... without never ending.

Sign up to request clarification or add additional context in comments.

8 Comments

Thanks for the reply. The generator version of squares_lst is O(n) in memory though. I do agree that the constant factor is minimized though. Is there a way to achieve O(1) memory using iterators and a pre-defined list?
@zero: squares_it is O(1) memory. Or are you considering the memory for nums too?
The entire algorithm to square those numbers would be O(n), if I am correct?
@zero: yes. But change it to (n ** 2 for n in xrange(1, 7)) and then you have a O(1) space algorithm
@zero: When considering space complexity, you only consider memory in addition to the input.
|
1
>>> l1 = [1, 2, 3, 4, 5, 6]
>>> l2 = [2, 3, 4, 5, 6, 7]
>>> iz = izip(l1, l2)

We still require O(min(l1, l2)) memory as we need to load the lists l1 and l2 in memory.

With zip you need storage for the two original lists plus the zipped list. With izip you don't store the zipped list.

Big O notation isn't particularly helpful here if you have to work with a real physical machine instead of some abstract concept of a machine. There's a hidden constant multiplier on your O(n) calculations that could influence the practicality of the code well before n tends to infinity.

>>> l1 = ( n for n in [1, 2, 3, 4, 5, 6] )
>>> l2 = ( n for n in [2, 3, 4, 5, 6, 7] )
>>> iz = izip(l1, l2)

We need to load the lists before converting them into generators, right? This means we'll waste memory. So - what is the point of generators here as well.

No point to generators here. Any time you see n for n in <expr> without either a more complex expression before the for or an if <expr> filter after it, that's a code smell as you could just have used the original sequence directly. The generators only become useful when you transform the input values into something else or filter them.

2 Comments

I should note that I'm practising for coding interviews and am tryingt o get solutions with O(1) memory using iterators.
A generator can also filter items, so n for n in sequence if predicate(n) may be useful without modifying any individual input value.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.