7

The code is below:

import numpy as np
X = np.array(range(15)).reshape(5,3)  # X's element value is meaningless
flag = np.random.randn(5,4)
y = np.array([0, 1, 2, 3, 0])  # Y's element value in range(flag.shape[1]) and Y.shape[0] equals X.shape[0]
dW = np.zeros((3, 4))  # dW.shape equals (X.shape[1], flag.shape[1])
for i in xrange(5):
    for j in xrange(4):
        if flag[i,j] > 0:
            dW[:,j] += X[i,:].T
            dW[:,y[i]] -= X[i,:].T

To compute dW more efficiently, how to vectorize this for loop?

3 Answers 3

5

Here's how I'd do it:

# has shape (x.shape[1],) + flag.shape
masked = np.where(flag > 0, X.T[...,np.newaxis], 0)

# sum over the i index
dW = masked.sum(axis=1)

# sum over the j index
np.subtract.at(dW, np.s_[:,y], masked.sum(axis=2))

# dW[:,y] -= masked.sum(axis=2) does not work here

See the documentation of ufunc.at for an explanation of that last comment

Sign up to request clarification or add additional context in comments.

2 Comments

With repeat values in the y index, -= can have buffering problems. That's where the ufunc .at can help.
So the first line masked = np.where(flag > 0, X.T[...,np.newaxis], 0) use broadcasting, brilliant! So one way to vectorization is increasing dimension?
1

Here's a vectorized approach based upon np.add.reduceat -

# --------------------- Setup output array ----------------------------------
dWOut = np.zeros((X.shape[1], flag.shape[1]))

# ------ STAGE #1 : Vectorize calculations for "dW[:,j] += X[i,:].T" --------
# Get indices where flag's transposed version has > 0
idx1 = np.argwhere(flag.T > 0)

# Row-extended version of X using idx1's col2 that corresponds to i-iterator
X_ext1 = X[idx1[:,1]]

# Get the indices at which we need to columns change
shift_idx1 = np.append(0,np.where(np.diff(idx1[:,0])>0)[0]+1)

# Use the changing indices as boundaries for add.reduceat to add 
# groups of rows from extended version of X
dWOut[:,np.unique(idx1[:,0])] += np.add.reduceat(X_ext1,shift_idx1,axis=0).T

# ------ STAGE #2 : Vectorize calculations for "dW[:,y[i]] -= X[i,:].T" -------
# Repeat same philsophy for this second stage, except we need to index into y.
# So, that would involve sorting and also the iterator involved is just "i".
idx2 = idx1[idx1[:,1].argsort()]
cols_idx1 = y[idx2[:,1]]
X_ext2 = X[idx2[:,1]]
sort_idx = (y[idx2[:,1]]).argsort()
X_ext2 = X_ext2[sort_idx]
shift_idx2 = np.append(0,np.where(np.diff(cols_idx1[sort_idx])>0)[0]+1)
dWOut[:,np.unique(cols_idx1)] -= np.add.reduceat(X_ext2,shift_idx2,axis=0).T

1 Comment

Thanks. Althougn it has lines of codes, it is easier to understand.
-1

You can do this:

ff = (flag > 0) * 1
ff = ff.reshape((5, 4, 1, 1))
XX = ff * X
[ii, jj] = np.meshgrid(np.arange(5), np.arange(4))
dW[:, jj] += XX[ii, jj, ii, :].transpose((2, 0, 1))
dW[:, y[ii]] -= XX[ii, jj, ii, :].transpose((2, 0, 1))

You can further merge and fold these expressions to get a one-liner but it won't add any more performance.

Update #1: Yep, sorry this is not giving correct results, I had a typo in my check

2 Comments

Since X is only ever indexed at [x, y, x, z], can it not be made 3 dimensional?
I think your first three lines are better spelt XX = np.where(flag[...,np.newaxis,np.newaxis], X, 0)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.