This code below best illustrates my problem:
The output to the console (NB it takes ~8 minutes to run even the first test) shows the 512x512x512x16-bit array allocations consuming no more than expected (256MByte for each one), and looking at "top" the process generally remains sub-600MByte as expected.
However, while the vectorized version of the function is being called, the process expands to enormous size (over 7GByte!). Even the most obvious explanation I can think of to account for this - that vectorize is converting the inputs and outputs to float64 internally - could only account for a couple of gigabytes, even though the vectorized function returns an int16, and the returned array is certainly an int16. Is there some way to avoid this happening ? Am I using/understanding vectorize's otypes argument wrong ?
import numpy as np
import subprocess
def logmem():
subprocess.call('cat /proc/meminfo | grep MemFree',shell=True)
def fn(x):
return np.int16(x*x)
def test_plain(v):
print "Explicit looping:"
logmem()
r=np.zeros(v.shape,dtype=np.int16)
for z in xrange(v.shape[0]):
for y in xrange(v.shape[1]):
for x in xrange(v.shape[2]):
r[z,y,x]=fn(x)
print type(r[0,0,0])
logmem()
return r
vecfn=np.vectorize(fn,otypes=[np.int16])
def test_vectorize(v):
print "Vectorize:"
logmem()
r=vecfn(v)
print type(r[0,0,0])
logmem()
return r
logmem()
s=(512,512,512)
v=np.ones(s,dtype=np.int16)
logmem()
test_plain(v)
test_vectorize(v)
v=None
logmem()
I'm using whichever versions of Python/numpy are current on an amd64 Debian Squeeze system (Python 2.6.6, numpy 1.4.1).