6

I'm fighting a memory leak in a Python project and spent much time on it already. I have deduced the problem to a small example. Now seems like I know the solution, but I can't understand why.

import random

def main():
    d = {}
    used_keys = []
    n = 0
    while True:
        # choose a key unique enough among used previously
        key = random.randint(0, 2 ** 60)
        d[key] = 1234 # the value doesn't matter
        used_keys.append(key)
        n += 1
        if n % 1000 == 0:
            # clean up every 1000 iterations
            print 'thousand'
            for key in used_keys:
                del d[key]
                used_keys[:] = []
                #used_keys = []

if __name__ == '__main__':
    main()

The idea is that I store some values in the dict d and memorize used keys in a list to be able to clean the dict from time to time.

This variation of the program confidently eats memory never returning it back. If I use alternative method to „clear” used_keys that is commented in the example, all is fine: memory consumption stays at constant level.

Why?

Tested on CPython and many linuxes.

4
  • How do you know for sure it never returns it? It might just be that the OS never asks for it back. Commented Jul 30, 2010 at 8:25
  • 2
    Shouldn't clearing used_keys be outside of the for key in used_keys loop? Commented Jul 30, 2010 at 8:27
  • 2
    >The idea is that I store some values in the dict d and memorize used keys in a list to be able to clean the dict from time to time. Why not use just d.keys()? It will be same list of keys. Commented Jul 30, 2010 at 8:28
  • @adamk See a comment to the accepted reply. @Daniel and @gnibbler Its just a model, if it were stand-alone code, I wouldn't use such odd methods. Commented Jul 30, 2010 at 8:43

2 Answers 2

5

Here's the reason - the current method does not delete the keys from the dict (only one, actually). This is because you clear the used_keys list during the loop, and the loop exits prematurely.

The 2nd (commented) method, however, does work as you assign a new value to used_keys so the loop finishes successfully.

See the difference between:

>>> a=[1,2,3]
>>> for x in a:
...    print x
...    a=[]
...
1
2
3

and

>>> a=[1,2,3]
>>> for x in a:
...    print x
...    a[:] = []
...
1
>>>
Sign up to request clarification or add additional context in comments.

1 Comment

Ah!! I'm stupid, stupid, stupid. I was so happy to reconstruct the memory leak in a small snippet… It is a sad mistake, of course. It doesn't represent my problem, I gonna continue hunting. But you're right with the answer on the original question. Thanks!
0

Why wouldn't something like this work?

from itertools import count
import uuid

def main():
    d = {}
    for n in count(1):
        # choose a key unique enough among used previously
        key = uuid.uuid1()
        d[key] = 1234 # the value doesn't matter
        if n % 1000 == 0:
            # clean up every 1000 iterations
            print 'thousand'
            d.clear()

if __name__ == '__main__':
    main()

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.