2

i'm new to python and have big memory issue. my script runs 24/7 and each day it allocates about 1gb more of my memory. i could narrow it down to this function:

Code:

#!/usr/bin/env python
# coding: utf8
import gc
from pympler import muppy
from pympler import summary
from pympler import tracker


v_list = [{ 
     'url_base' : 'http://www.immoscout24.de',
     'url_before_page' : '/Suche/S-T/P-',
     'url_after_page' : '/Wohnung-Kauf/Hamburg/Hamburg/-/-/50,00-/EURO--500000,00?pagerReporting=true',}]

# returns url
def get_url(v, page_num):
    return v['url_base'] + v['url_before_page'] + str(page_num) + v['url_after_page']


while True:
    gc.enable()

    for v_idx,v in enumerate(v_list):

        # mem test ouput
        all_objects = muppy.get_objects()
        sum1 = summary.summarize(all_objects)
        summary.print_(sum1)


        # magic happens here
        url = get_url(v, 1)


        # mem test ouput
        all_objects = muppy.get_objects()
        sum1 = summary.summarize(all_objects)
        summary.print_(sum1)

        # collects unlinked objects
        gc.collect()

Output:

======================== | =========== | ============
                    list |       26154 |     10.90 MB
                     str |       31202 |      1.90 MB
                    dict |         507 |    785.88 KB

expecially the list attribute is getting bigger and bigger each cycle around 600kb and i don't have an idea why. in my opinion i do not store anything here and the url variable should be overwritten each time. so basically there should be any memory consumption at all.

what am i missing here? :-)

2
  • When you run just this example code—in particular, with this minimal get_url function—you see 600KB of list storage every loop? Or you see that with your real program, which does "magic" that you're not showing us? Commented Oct 24, 2014 at 18:36
  • Yes, if I run this example code I get around 600KB of list storage every loop. Python 2.7.2 on macos and linux. Commented Oct 24, 2014 at 18:43

1 Answer 1

3

This "memory leak" is 100% caused by your testing for memory leaks. The all_objects list ends up maintaining a list of almost every object you ever created—even the ones you don't need anymore, which would have been cleaned up if they weren't in all_objects, but they are.

As a quick test:

  • If I run this code as-is, I get the list value growing by about 600KB/cycle, just as you say in your question, at least up to 20MB, where I killed it.

  • If I add del all_objects right after the sum1 = line, however, I get the list value bouncing back and forth between 100KB and 650KB.

If you think about why this is happening, it's pretty obvious in retrospect. At the point when you call muppy.get_objects() (except the first time), the previous value of all_objects is still alive. So, it's one of the objects that gets returned. That means that, even when you assign the return value to all_objects, you're not freeing the old value, you're just dropping its refcount from 2 to 1. Which keeps alive not just the old value itself, but every element within it—which, by definition, is everything that was alive last time through the loop.

If you can find a memory-exploring library that gives you weakrefs instead of normal references, that might help. Otherwise, make sure to do a del all_objects at some point before calling muppy.get_objects again. (Right after the only place you use it, the sum1 = line, seems like the most obvious place.)

Sign up to request clarification or add additional context in comments.

3 Comments

oh dear... the funny thing is, in my real program somewhere is a leak, because it runs without the muppy stuff. i obviously have to search again. this time with 'del all_objects'. thank you mate!
Or simply don't store it in a variable: summary.print_(summary.summarize(muppy.get_objects()))
@Ngenator: Yeah, I was thinking about recommending that. What it comes down to is, which is clearer to someone (like the OP in 6 months) who needs to read or extend this code? I think the explicit del serves as a nice reminder that we've just created a list of the entire universe… but on the other hand you could argue that reminder isn't important enough to let it get in the way of the code.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.