How to avoid enormous additional memory consumption when using numpy vectorize?

Question

This code below best illustrates my problem:

The output to the console (NB it takes ~8 minutes to run even the first test) shows the 512x512x512x16-bit array allocations consuming no more than expected (256MByte for each one), and looking at "top" the process generally remains sub-600MByte as expected.

However, while the vectorized version of the function is being called, the process expands to enormous size (over 7GByte!). Even the most obvious explanation I can think of to account for this - that vectorize is converting the inputs and outputs to float64 internally - could only account for a couple of gigabytes, even though the vectorized function returns an int16, and the returned array is certainly an int16. Is there some way to avoid this happening ? Am I using/understanding vectorize's otypes argument wrong ?

import numpy as np
import subprocess

def logmem():
    subprocess.call('cat /proc/meminfo | grep MemFree',shell=True)

def fn(x):
    return np.int16(x*x)

def test_plain(v):
    print "Explicit looping:"
    logmem()
    r=np.zeros(v.shape,dtype=np.int16)
    for z in xrange(v.shape[0]):
        for y in xrange(v.shape[1]):
            for x in xrange(v.shape[2]):
                r[z,y,x]=fn(x)
    print type(r[0,0,0])
    logmem()
    return r

vecfn=np.vectorize(fn,otypes=[np.int16])

def test_vectorize(v):
    print "Vectorize:"
    logmem()
    r=vecfn(v)
    print type(r[0,0,0])
    logmem()
    return r

logmem()    
s=(512,512,512)
v=np.ones(s,dtype=np.int16)
logmem()
test_plain(v)
test_vectorize(v)
v=None
logmem()

I'm using whichever versions of Python/numpy are current on an amd64 Debian Squeeze system (Python 2.6.6, numpy 1.4.1).

No, I've never tried profiling python code; I was under the impression it would just tell me about time in calls. Can it also tell me something useful about where memory is allocated ? — timday
– timday, Commented Aug 16, 2011 at 12:53
Profiling code will tell you what is soaking up the most time, this might help you isolate what is causing a slowdown in your code. Heapy will also provide you with accurate results of your memory footprint. — Jakob Bowyer
– Jakob Bowyer, Commented Aug 16, 2011 at 12:58

DaveP · Accepted Answer · 2011-08-16 23:22:31Z

3

It is a basic problem of vectorisation that all intermediate values are also vectors. While this is a convenient way to get a decent speed enhancement, it can be very inefficient with memory usage, and will be constantly thrashing your CPU cache. To overcome this problem, you need to use an approach which has explicit loops running at compiled speed, not at python speed. The best ways to do this are to use cython, fortran code wrapped with f2py or numexpr. You can find a comparison of these approaches here, although this focuses more on speed than memory usage.

answered Aug 16, 2011 at 23:22

DaveP

7,1222 gold badges27 silver badges37 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

HYRY · Accepted Answer · 2011-08-16 22:39:08Z

2

you can read the source code of vectorize(). It convert the array's dtype to object, and call np.frompyfunc() to create the ufunc from your python function, the ufunc returns object array, and finally vectorize() convert object array to int16 array.

It will use many memory when the dtype of array is object.

Using python function to do element wise calculation is slow, even is's converted to ufunc by frompyfunc().

answered Aug 16, 2011 at 22:39

HYRY

97.9k28 gold badges197 silver badges192 bronze badges

Collectives™ on Stack Overflow

How to avoid enormous additional memory consumption when using numpy vectorize?

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related