How to vectorize computation on arrays of different dimensions?

Question

I have some large numpy arrays of complex numbers I need to perform computations on.

import numpy as np

# Reduced sizes -- real ones are orders of magnitude larger
n, d, l = 50000, 3, 1000

# Two complex matrices and a column vector
x = np.random.rand(n, d) + 1j*np.random.rand(n, d)
s = np.random.rand(n, d) + 1j*np.random.rand(n, d)
v = np.random.rand(l)[:, np.newaxis]

The function is basically x*v*s for each row of x (and s) and then that product is summed across the row. Because the arrays are different sizes, I can't figure out a way to vectorize the computation and it's way too slow to use a for-loop.

My current implementation is this (~3.5 seconds):

h = []
for i in range(len(x)):
    h.append(np.sum(x[i,:]*v*s[i,:], axis=1))

h = np.asarray(h)

I also tried using np.apply_along_axis() with an augmented matrix but it's only slightly faster (~2.6s) and not that readable.

def func(m, v):
    return np.sum(m[:d]*v*m[d:], axis=1)

h = np.apply_along_axis(func, 1, np.hstack([x, s]), v)

What's a much quicker way to compute this result? I can leverage other packages such as dask if that helps.

How about h2 = ((x * s).sum(axis=1)[:,None] * v.T); np.allclose(h, h2) >>> True — cs95
– cs95, Commented Jun 12, 2019 at 19:46
apply_along_axis is not a speed tool; just a convenience (in some cases). — hpaulj
– hpaulj, Commented Jun 12, 2019 at 20:13

cs95 · Accepted Answer · 2019-06-13 01:43:26Z

With broadcasting this should work:

np.sum(((x*s)[...,None]*v[:,0], axis=1)

but with your sample dimensions I'm getting a memory error. The 'outer' broadcasted array (n,d,l) shape is too large for my memory.

I can reduce memory usage by iterating on the smaller d dimension:

res = np.zeros((n,l), dtype=x.dtype) 
for i in range(d): 
    res += (x[:,i]*s[:,i])[:,None]*v[:,0]

This tests the same as your h, but I wasn't able to complete time tests. Generally iterating on the smaller dimension is faster.

I may repeat things with small dimensions.

This probably can also be expressed as an einsum problem, though it may not help with these dimensions.

In [1]: n, d, l = 5000, 3, 1000 
   ...:  
   ...: # Two complex matrices and a column vector 
   ...: x = np.random.rand(n, d) + 1j*np.random.rand(n, d) 
   ...: s = np.random.rand(n, d) + 1j*np.random.rand(n, d) 
   ...: v = np.random.rand(l)[:, np.newaxis]                                                           
In [2]:                                                                                                
In [2]: h = [] 
   ...: for i in range(len(x)): 
   ...:     h.append(np.sum(x[i,:]*v*s[i,:], axis=1)) 
   ...:  
   ...: h = np.asarray(h)                                                                              
In [3]: h.shape                                                                                        
Out[3]: (5000, 1000)
In [4]: res = np.zeros((n,l), dtype=x.dtype) 
   ...: for i in range(d): 
   ...:     res += (x[:,i]*s[:,i])[:,None]*v[:,0] 
   ...:                                                                                                
In [5]: res.shape                                                                                      
Out[5]: (5000, 1000)
In [6]: np.allclose(res,h)                                                                             
Out[6]: True
In [7]: %%timeit  
   ...: h = [] 
   ...: for i in range(len(x)): 
   ...:     h.append(np.sum(x[i,:]*v*s[i,:], axis=1)) 
   ...: h = np.asarray(h) 
   ...:  
   ...:                                                                                                
490 ms ± 3.36 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [8]: %%timeit 
   ...: res = np.zeros((n,l), dtype=x.dtype) 
   ...: for i in range(d): 
   ...:     res += (x[:,i]*s[:,i])[:,None]*v[:,0] 
   ...:                                                                                                

354 ms ± 1.44 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [9]:                                                                                                
In [9]: np.sum((x*s)[...,None]*v[:,0], axis=1).shape                                                   
Out[9]: (5000, 1000)
In [10]: out = np.sum((x*s)[...,None]*v[:,0], axis=1)                                                  
In [11]: np.allclose(h,out)                                                                            
Out[11]: True
In [12]: timeit out = np.sum((x*s)[...,None]*v[:,0], axis=1)                                           
310 ms ± 964 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)

Some time savings, but not big.

And the einsum version:

In [13]: np.einsum('ij,ij,k->ik',x,s,v[:,0]).shape                                                     
Out[13]: (5000, 1000)
In [14]: np.allclose(np.einsum('ij,ij,k->ik',x,s,v[:,0]),h)                                            
Out[14]: True
In [15]: timeit np.einsum('ij,ij,k->ik',x,s,v[:,0]).shape                                              
167 ms ± 137 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Good time savings. But I don't know how it will scale.

But the einsum made me realize that we can sum on d dimension earlier, before multiplying by v - and gain a lot in time and memory usage:

In [16]: np.allclose(np.sum(x*s, axis=1)[:,None]*v[:,0],h)                                             
Out[16]: True
In [17]: timeit np.sum(x*s, axis=1)[:,None]*v[:,0]                                                     
68.4 ms ± 1.58 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

@cs95 got there first!

As per @PaulPanzer's comment, the optimize flag helps. It's probably making the same deduction - that we can sum on j early:

In [18]: timeit np.einsum('ij,ij,k->ik',x,s,v[:,0],optimize=True).shape                                
91.6 ms ± 991 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

This seems to be one of the cases where the optimize flag actually helps: np.einsum('ij,ij,kl->ik',x,s,v,optimize=True) is twice as fast on my rig. But np.outer(np.einsum('ij,ij->i',x,s), v) is still faster.
Enjoyed reading the thought process behind the gradual optimization of this operation. The einsum trick is pretty nice as well.
What are my options if the computation is np.sum(np.sin(x[i,:]*v)*s[i,:], axis=1) and I can't rely on the x*s optimization?

Collectives™ on Stack Overflow

How to vectorize computation on arrays of different dimensions?

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related