1

I need to improve the performance of an operation performed on arrays of different shapes/sizes. The array pos has a shape of (2, 500) and the xa, xb, ya, yb arrays have shapes of (30,).

The operation shown in the MVCE below combines each of the two dimensions of pos with xa, xb and ya, yb.

Can this be done applying numpy broadcasting?

import numpy as np

# Some random data
N = 30
xa, xb = np.random.uniform(0., 1., N), np.random.uniform(0., 1., N)
ya, yb = np.random.uniform(0., 1., N), np.random.uniform(0., 1., N)

# Grid
M = 500
ext = [xa.min(), xa.max(), ya.min(), ya.max()]
x, y = np.mgrid[ext[0]:ext[1]:complex(0, M), ext[2]:ext[3]:complex(0, M)]
pos = np.vstack([x.ravel(), y.ravel()])

# Apply broadcasting on the operation performed by this 'for' block?
vals = []
for p in zip(*pos):
    vals.append(np.sum(np.exp(-0.5 * (
        ((p[0] - xa) / xb)**2 + ((p[1] - ya) / yb)**2)) / (xb * yb)))

1 Answer 1

1

You can use np.tile and modify the for loop as follows

xa_tiled = np.tile(xa, (pos.shape[1],1))
xb_tiled = np.tile(xb, (pos.shape[1],1))
ya_tiled = np.tile(ya, (pos.shape[1],1))
yb_tiled = np.tile(yb, (pos.shape[1],1))

vals_ = np.exp(-0.5 * (
         ((pos[0].reshape(pos.shape[1],1) - xa_tiled) / xb_tiled)**2 + ((pos[1].reshape(pos.shape[1],1) - ya_tiled) / yb_tiled)**2)) / (xb_tiled * yb_tiled)
vals_ = vals_.sum(axis=1)

Explanation:

  1. In each iteration you are taking pos[0][i] and pos[1][i] and doing operation on xa, xb, ya, yb.
  2. Tile replicates all 4 of this 250000 times which is the shape[1] of pos or the number of iterations.
  3. We also need to reshape pos[0] and pos[1] and just make them 2D for operations to be valid.

Timing details: On my machine vectorized code takes ~ .20 sec whereas non-vectorized code takes around 3 sec. Below is the code to reproduce:

import numpy as np
import time

# Some random data
N = 30
xa, xb = np.random.uniform(0., 1., N), np.random.uniform(0., 1., N)
ya, yb = np.random.uniform(0., 1., N), np.random.uniform(0., 1., N)

# Grid
M = 500
ext = [xa.min(), xa.max(), ya.min(), ya.max()]
x, y = np.mgrid[ext[0]:ext[1]:complex(0, M), ext[2]:ext[3]:complex(0, M)]
pos = np.vstack([x.ravel(), y.ravel()])

# Apply broadcasting on the operation performed by this 'for' block?

start = time.time()
for i in range(10):
    vals = []
    for p in zip(*pos):
        vals.append(np.sum(np.exp(-0.5 * (
            ((p[0] - xa) / xb)**2 + ((p[1] - ya) / yb)**2)) / (xb * yb)))
stop = time.time()
print( (stop-start)/10)        

start = time.time()

for i in range(10):
    xa_tiled = np.tile(xa, (pos.shape[1],1))
    xb_tiled = np.tile(xb, (pos.shape[1],1))
    ya_tiled = np.tile(ya, (pos.shape[1],1))
    yb_tiled = np.tile(yb, (pos.shape[1],1))

    vals_ = np.exp(-0.5 * (
            ((pos[0,:].reshape(pos.shape[1],1) - xa_tiled) / xb_tiled)**2 + ((pos[1].reshape(pos.shape[1],1) - ya_tiled) / yb_tiled)**2)) / (xb_tiled * yb_tiled)
    vals_ = vals_.sum(axis=1)
stop = time.time()
print( (stop-start)/10)        

print(np.allclose(vals_, np.array(vals))==True)
Sign up to request clarification or add additional context in comments.

8 Comments

Thank you for the answer but have you timed this approach? It seems to be quite slower than mine.
Let me check that it seemed faster to me. Brb
@Gabriel Updated the timing details
Checked again, in my system it's not 10X faster as you what find, but it is 2X faster at least (not sure why I was getting slower timings before) Thank you Umang!
@Gabriel yes that would be the case as np.tile will end up creating a big array and I think that might also be the reason for your slowdown. If you are really low on memory, your approach is better But for the current value of N(500), the code did not take much ram for me (<500MB)
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.