4

I have a python script that runs a loop. Within this loop, the function DoDebugInfo is called, once per loop iteration. This function basically prints some pictures to the hard disk using matplotlib, export a KML file and do some other calculations, and returns nothing.

I'm having the problem that python, for each run, the function DoDebugInfo eats more and more RAM. I guess some variable are increasing it's size on each loop.

I added the following lines before and after the call:

print '=== before: ' + str(resource.getrusage(resource.RUSAGE_SELF).ru_maxrss / 1000)
DoDebugInfo(inputs)
print '=== after: ' + str(resource.getrusage(resource.RUSAGE_SELF).ru_maxrss / 1000)

The output is:

=== before: 71598.08
=== after: 170237.952
=== before: 170237.952
=== after: 255696.896
=== before: 255696.896
=== after: 341409.792

As you can see, before the call the program has a memory footprint, and after the call it increases, but stays stable until before the next call.

why is this? since DoDebugInfo(inputs) is a function that returns nothing, how can it be that some variables stay on memory? is there a need to clear all variables at the end of the function?

Edit: the DoDebugInfo imports this functions:

def plot_line(x,y,kind,lab_x,lab_y,filename):
    fig = plt.figure(figsize=(11,6),dpi=300)
    ax = fig.add_subplot(111)
    ax.grid(True,which='both')
    #print 'plotting'
    if type(x[0]) is datetime.datetime:
        #print 'datetime detected'
        ax.plot_date(matplotlib.dates.date2num(x),y,kind)
        ax.fmt_xdata = DateFormatter('%H')
        ax.autoscale_view()
        fig.autofmt_xdate()
    else:   
        #print 'no datetime'
        ax.plot(x,y,kind)
    xlabel = ax.set_xlabel(lab_x)
    ax.set_ylabel(lab_y)
    fig.savefig(filename,bbox_extra_artists=[xlabel], bbox_inches='tight')

def plot_hist(x,Nbins,lab_x,lab_y,filename):
    fig = plt.figure(figsize=(11,6),dpi=300)
    ax = fig.add_subplot(111)
    ax.grid(True,which='both')
    ax.hist(x,Nbins)
    xlabel = ax.set_xlabel(lab_x)
    ax.set_ylabel(lab_y)
    fig.savefig(filename,bbox_extra_artists=[xlabel], bbox_inches='tight')

and plots 10 figures to the disk using something like:

plot_line(index,alt,'-','Drive Index','Altitude in m',output_dir + 'name.png')

if I comment the lines that use plot_line the problem does not happen, so the leak should be on this lines of code.

Thanks

5
  • 1
    Show us your DoDebugInfo function. Commented Apr 18, 2013 at 10:53
  • 2
    A function that returns nothing can still alter globals, or use a mutable parameter that is not cleaned up between calls. Commented Apr 18, 2013 at 10:58
  • @eumiro I have narrowed the leak, please take a look at the function I'm using inside DoDebugInfo. here is the leak somewhere. Thanks Commented Apr 18, 2013 at 11:05
  • @MartijnPieters the function does not alter globals... and I don't know what is a mutable parameter, but I'll check it. thanks Commented Apr 18, 2013 at 11:06
  • Martijn is refering to the mutable default argument gotcha. Commented Apr 18, 2013 at 11:06

2 Answers 2

6

The problem relies on so many figures being created and never closed. Somehow python keeps them all alive.

I added the line

plt.close()

to each of my plot functions plot_line and plot_hist and the problem is gone.

Sign up to request clarification or add additional context in comments.

3 Comments

I wonder if you could you wrap that thing in a with statement?
@BenDundee docs.python.org/2/library/contextlib.html allows easy creation of context managers
I'm think that with could generate the same issue.
0

Does the size grow without bound? Very few programs (or libraries) return heap to the system that they allocate even when no longer used, and CPython (2.7.3) is no exception. The usual culprit is malloc which will increase process memory on demand, will return space to its free-list upon free, but never de-allocates that which it has requested from the system. This sample code intentionally grabs memory and shows that the process use is bounded and finite:

import resource

def maxrss(start, end, step=1):
    """allocate ever larger strings and show the process rss"""
    for exp in range(start, end, step):
        s = '0' * (2 ** exp)
        print '%5i: %sk' % (exp, 
            resource.getrusage(resource.RUSAGE_SELF).ru_maxrss / 1000)
    # s goes out of scope here and is freed by Python but not returned 
    # to the system

try:
    maxrss(1, 40, 5)
except MemoryError:
    print 'MemoryError'

maxrss(1, 30, 5)

Where the output (on my machine) is, in part:

26: 72k
31: 2167k
MemoryError
 1: 2167k
 6: 2167k
 ...
26: 2170k

Which shows that the interpreter failed to get 2**36 bytes of heap from the system, but still had the memory "on hand" to fill later requests. As the last line of the script demonstrates the memory is there for Python to use, even if it is not currently using it.

1 Comment

it actually keeps growing in memory until my mac get's difficult to operate and the UI is almost frozen. I killed the python app and it released 4 GB of RAM. I don't know what would happen if I let him go further...

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.