How to perform efficient vectorisation in numpy (memory constraints)

Question

I am dealing with a lot of data, stored in an np.array of size (3, n).

I need to perform, for each datum x: xx.T (to form a 3x3 matrix). So this would make n lots of 3x3 matrices. I then need to add the results of each of these matrices to form an overall 3x3 matrix.

I did this as follows: np.einsum('ij,ki->jk', x, x.T) which seems to work. But the problem is, my n is incredibly large (I'm dealing with very large images of many mega pixels), and this method results in a MemoryError.

Are there any other ways I can do this whilst maintaining a vectorized method?

Thanks!

@Divakar: if the array has shape (3,n), wouldn't you want x.dot(x.T)? — DSM
– DSM, Commented Nov 18, 2017 at 23:10
@DSM Well then np.einsum('ij,ki->jk', x, x.T) won't give a (3,3) output. Something is not clear there. — Divakar
– Divakar, Commented Nov 18, 2017 at 23:11
Maybe that's the glitch. If the einsum is done the wrong way round you would expect memory problems if the "non-3" dim is so large — Paul Panzer
– Paul Panzer, Commented Nov 18, 2017 at 23:20

hpaulj · Accepted Answer · 2017-11-19 00:54:30Z

Working from your comment:

In [465]: x=np.array([[1,2,3],[4,5,6]])
In [466]: np.einsum('ij,ik', x,x)
Out[466]: 
array([[17, 22, 27],
       [22, 29, 36],
       [27, 36, 45]])
In [467]: x=np.ones((10,3))
In [468]: np.einsum('ij,ik', x,x)
Out[468]: 
array([[ 10.,  10.,  10.],
       [ 10.,  10.,  10.],
       [ 10.,  10.,  10.]])
In [471]: x=np.ones((10000000,3))
In [472]: np.einsum('ij,ik', x,x)
Out[472]: 
array([[ 10000000.,  10000000.,  10000000.],
       [ 10000000.,  10000000.,  10000000.],
       [ 10000000.,  10000000.,  10000000.]])

If the array becomes too big for memory, you could try chunking it:

In [479]: res = np.zeros((3,3))
In [480]: for i in range(10):
     ...:     res += np.einsum('ij,ik',x,x)
     ...:     
In [481]: res
Out[481]: 
array([[ 170.,  220.,  270.],
       [ 220.,  290.,  360.],
       [ 270.,  360.,  450.]])

Generally a few iterations over a complex task is faster than pushing the memory limits without any iterations. At some point the memory management costs exceed the iteration costs.

This simple einsum is just as easily solved with dot, and potentially faster:

In [484]: x.T.dot(x)
Out[484]: 
array([[17, 22, 27],
       [22, 29, 36],
       [27, 36, 45]])

In [486]: x=np.ones((10000000,3))
In [487]: timeit np.einsum('ij,ik',x,x)
426 ms ± 151 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [488]: x1=np.ones((50000000,3))
In [490]: timeit np.einsum('ij,ik',x1,x1)
2.14 s ± 15.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [493]: %%timeit
     ...: res = np.zeros((3,3))
     ...: x2 = x1.reshape(5,-1,3)
     ...: for i in range(5):
     ...:    x3 = x2[i]
     ...:    res += np.einsum('ij,ik',x3,x3) 
2.1 s ± 5.74 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

So even though I'm not hitting a memory limit yet, chunking gives me a slight speed improvement. (though I'm not far from memory limits; a few left over items in the ipython history stack can push me over).

An just for comparison, the @ (same as dot for 2d)

In [494]: %%timeit
     ...: res = np.zeros((3,3))
     ...: x2 = x1.reshape(5,-1,3)
     ...: for i in range(5):
     ...:    x3 = x2[i]
     ...:    res += x3.T@x3
537 ms ± 9.62 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [495]: timeit x1.T@x1
530 ms ± 1.35 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [496]: timeit x.T@x
106 ms ± 1.41 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

At this size chunking does not help when using dot.

Collectives™ on Stack Overflow

How to perform efficient vectorisation in numpy (memory constraints)

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related