1

I am dealing with a lot of data, stored in an np.array of size (3, n).

I need to perform, for each datum x: xx.T (to form a 3x3 matrix). So this would make n lots of 3x3 matrices. I then need to add the results of each of these matrices to form an overall 3x3 matrix.

I did this as follows: np.einsum('ij,ki->jk', x, x.T) which seems to work. But the problem is, my n is incredibly large (I'm dealing with very large images of many mega pixels), and this method results in a MemoryError.

Are there any other ways I can do this whilst maintaining a vectorized method?

Thanks!

8
  • Give us a measure of dataset sizes involved, i.e. n here? Commented Nov 18, 2017 at 23:06
  • You could simply do : x.T.dot(x). Commented Nov 18, 2017 at 23:09
  • @Divakar: if the array has shape (3,n), wouldn't you want x.dot(x.T)? Commented Nov 18, 2017 at 23:10
  • @DSM Well then np.einsum('ij,ki->jk', x, x.T) won't give a (3,3) output. Something is not clear there. Commented Nov 18, 2017 at 23:11
  • 1
    Maybe that's the glitch. If the einsum is done the wrong way round you would expect memory problems if the "non-3" dim is so large Commented Nov 18, 2017 at 23:20

1 Answer 1

1

Working from your comment:

In [465]: x=np.array([[1,2,3],[4,5,6]])
In [466]: np.einsum('ij,ik', x,x)
Out[466]: 
array([[17, 22, 27],
       [22, 29, 36],
       [27, 36, 45]])
In [467]: x=np.ones((10,3))
In [468]: np.einsum('ij,ik', x,x)
Out[468]: 
array([[ 10.,  10.,  10.],
       [ 10.,  10.,  10.],
       [ 10.,  10.,  10.]])
In [471]: x=np.ones((10000000,3))
In [472]: np.einsum('ij,ik', x,x)
Out[472]: 
array([[ 10000000.,  10000000.,  10000000.],
       [ 10000000.,  10000000.,  10000000.],
       [ 10000000.,  10000000.,  10000000.]])

If the array becomes too big for memory, you could try chunking it:

In [479]: res = np.zeros((3,3))
In [480]: for i in range(10):
     ...:     res += np.einsum('ij,ik',x,x)
     ...:     
In [481]: res
Out[481]: 
array([[ 170.,  220.,  270.],
       [ 220.,  290.,  360.],
       [ 270.,  360.,  450.]])

Generally a few iterations over a complex task is faster than pushing the memory limits without any iterations. At some point the memory management costs exceed the iteration costs.


This simple einsum is just as easily solved with dot, and potentially faster:

In [484]: x.T.dot(x)
Out[484]: 
array([[17, 22, 27],
       [22, 29, 36],
       [27, 36, 45]])

In [486]: x=np.ones((10000000,3))
In [487]: timeit np.einsum('ij,ik',x,x)
426 ms ± 151 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [488]: x1=np.ones((50000000,3))
In [490]: timeit np.einsum('ij,ik',x1,x1)
2.14 s ± 15.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [493]: %%timeit
     ...: res = np.zeros((3,3))
     ...: x2 = x1.reshape(5,-1,3)
     ...: for i in range(5):
     ...:    x3 = x2[i]
     ...:    res += np.einsum('ij,ik',x3,x3) 
2.1 s ± 5.74 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

So even though I'm not hitting a memory limit yet, chunking gives me a slight speed improvement. (though I'm not far from memory limits; a few left over items in the ipython history stack can push me over).

An just for comparison, the @ (same as dot for 2d)

In [494]: %%timeit
     ...: res = np.zeros((3,3))
     ...: x2 = x1.reshape(5,-1,3)
     ...: for i in range(5):
     ...:    x3 = x2[i]
     ...:    res += x3.T@x3
537 ms ± 9.62 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [495]: timeit x1.T@x1
530 ms ± 1.35 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [496]: timeit x.T@x
106 ms ± 1.41 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

At this size chunking does not help when using dot.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.