rewriting a loop in numpy for faster execution

Question

I am writing a function which accepts a numpy array a of length 200, and matrix M of size 200 x 200, and does the following operation :

for i in range(len(a)):
    x = a[i]
    for j in range(len(a)):
        y = a[j]
        z = M[i][j]
        d[i][j] = 2 * z/(y+x)
return d

How can I vectorize this piece of code to boost my runtime?

You have to specify the shape of a and M

Brenlla
– Brenlla

2020-03-17 09:28:17 +00:00
Commented Mar 17, 2020 at 9:28 — Brenlla
– Brenlla, Commented Mar 17, 2020 at 9:28
please check the updated description

WitchKingofAngmar
– WitchKingofAngmar

2020-03-17 09:30:54 +00:00
Commented Mar 17, 2020 at 9:30 — WitchKingofAngmar
– WitchKingofAngmar, Commented Mar 17, 2020 at 9:30

phipsgabler · Accepted Answer · 2020-03-17 11:02:37Z

2

Numpy's ufuncs all have an outer method to perform operations "cross-wise" on two arrays. So to avoid most intermediate calculation and vectorize as far as possible:

def f(M, a):
    return 2 * M / np.add.outer(a, a)

Answer for the old version of the question (left, because it's still useful):

For such things, I found it best to always work in steps, and try to find the right einsum expression.

# the definition given in the original question,
# before the z / (y + x) update
def f0():
    d = np.empty((3,3))
    for i in range(len(a)):
        x = a[i]
        for j in range(len(a)):
            y = a[j]
            z = M[i][j]
            d[i][j] = 2 * x/(y+z)
    return d

# rewrite things inlined
def f1():
    d = np.empty((3,3))
    for i in range(len(a)):
        for j in range(len(a)):
            d[i, j] = 2 * a[i]/(a[j] + M[i, j])
    return d

# factor out broadcasting
def f2():
    d = np.empty((3,3))
    for i in range(len(a)):
        m = a + M[i, :]
        for j in range(len(a)):
            d[i,j] = 2 * a[i]/m[j]
    return d

# more broadcasting
def f3():
    d = np.empty((3,3))
    m = a + M
    for i in range(len(a)):
        for j in range(len(a)):
            d[i,j] = 2 * a[i]/m[i,j]
    return d

# now turn loops into einsums
def f4():
    d = np.empty((3,3))
    m = 1/(a + M)
    d[:,:] = 2 * np.einsum('i,ij->ij', a, m)
    return d

# collect everything
def f5():
    return np.einsum('i,ij->ij', a, 2 / (a + M))

edited Mar 17, 2020 at 11:02

answered Mar 17, 2020 at 9:49

phipsgabler

21.1k4 gold badges44 silver badges62 bronze badges

Sign up to request clarification or add additional context in comments.

11 Comments

WitchKingofAngmar Over a year ago

Thank you for your inputs! I compared outputs of f5 and my function, but the outputs weren't the same( I checked using np.allclose()).

phipsgabler Over a year ago

That's interesting. It works for me on dummy inputs.

phipsgabler Over a year ago

Can you inspect the actual differences by eye? Are your results completely off?

WitchKingofAngmar Over a year ago

yeah, the max value in the 200 * 200 matrix for my solution is ~ 1.76, with the einsum operation its > 80000. I'm testing the functions on dummy data generated using np.random.random()

WitchKingofAngmar Over a year ago

how are you testing with dummy inputs?

|

Heike · Accepted Answer · 2020-03-17 09:45:38Z

1

You could do something like

d = 2*numpy.atleast_2d(a).T/(a+M)

answered Mar 17, 2020 at 9:45

Heike

24.5k2 gold badges34 silver badges46 bronze badges

Comments

max9111 · Accepted Answer · 2020-03-17 21:25:33Z

Apart from numpy-vectorization using Numba would also a simple and performant method, to speed up code with loops. Example

import numpy as np
import numba as nb

@nb.njit(fastmath=True,error_model="numpy",parallel=True)
def calc(a,M):
    d=np.empty((a.shape[0],a.shape[0]))
    for i in nb.prange(a.shape[0]):
        x = a[i]
        for j in range(a.shape[0]):
            y = a[j]
            z = M[i,j]
            d[i,j] = 2. * z/(y+x)
    return d

Timings

M=np.random.rand(200,200)
a=np.random.rand(200)

d=calc(a,M) #first call takes longer due to compilation overhead
%timeit d=calc(a,M)
#parallel=True there is only quite limited speedup because of the small problem (200x200)
#11 µs ± 51 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
#parallel=False
#21.2 µs ± 191 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

#pure numpy solution (hpaulj)
%timeit d = 2 * M/(a[:,None]+a[None,:])
#75.7 µs ± 386 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

#without compilation
#20.8 ms ± 500 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

hpaulj · Accepted Answer · 2020-03-17 16:31:55Z

With a pair of sample arrays (you should have provided these):

In [196]: a = np.arange(1,4); M = np.arange(1,10).reshape(3,3)                                                       
In [197]: a                                                                                                          
Out[197]: array([1, 2, 3])
In [198]: M                                                                                                          
Out[198]: 
array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])
In [199]: d = 2 * M/(a[:,None]+a[None,:])                                                                            
In [200]: d                                                                                                          
Out[200]: 
array([[1.        , 1.33333333, 1.5       ],
       [2.66666667, 2.5       , 2.4       ],
       [3.5       , 3.2       , 3.        ]])

The a[None,:] could be simplified to a, but I wanted to clarify the broadcasting use to calculate this outer product. There are various tools in numpy for do this. I like the None indexing because it is simple and idiomatic.

testing against your code (again, you should have provided such a result):

In [202]: def foo(a): 
     ...:     d = np.zeros((3,3)) 
     ...:     for i in range(len(a)): 
     ...:         x = a[i] 
     ...:         for j in range(len(a)): 
     ...:             y = a[j] 
     ...:             z = M[i][j] 
     ...:             d[i][j] = 2 * z/(y+x) 
     ...:     return d 
     ...:                                                                                                            
In [203]: foo(a)                                                                                                     
Out[203]: 
array([[1.        , 1.33333333, 1.5       ],
       [2.66666667, 2.5       , 2.4       ],
       [3.5       , 3.2       , 3.        ]])

Collectives™ on Stack Overflow

rewriting a loop in numpy for faster execution

4 Answers 4

11 Comments

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

11 Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related