Vectorization and Nested Matrix Multiplication

Question

Here is the original code:

K = zeros(N*N)
for a=1:N
    for i=1:I
        for j=1:J
            M = kron(X(:,:,a).',Y(:,:,a,i,j));

            %A function that essentially adds M to K. 
        end
    end
end

The goal is to vectorize the kroniker multiplication calls. My intuition is to think of X and Y as containers of matrices (for reference, the slices of X and Y being fed to kron are square matrices of the order 7x7). Under this container scheme, X appears a 1-D container and Y as a 3-D container. My next guess was to reshape Y into a 2-D container or better yet a 1-D container and then do element wise multiplication of X and Y. Questions are: how would do this reshaping in a way that preserves the trace of M and can matlab even handle this idea in this container idea or do the containers need to be further reshaped to expose the inner matrix elements further?

Divakar · Accepted Answer · 2016-07-28 18:06:11Z

9

Approach #1: Matrix multiplication with 6D permute

% Get sizes
[m1,m2,~] =  size(X);
[n1,n2,N,n4,n5] =  size(Y);

% Lose the third dim from X and Y with matrix-multiplication
parte1 = reshape(permute(Y,[1,2,4,5,3]),[],N)*reshape(X,[],N).';

% Rearrange the leftover dims to bring kron format
parte2 = reshape(parte1,[n1,n2,I,J,m1,m2]);

% Lose dims correspinding to last two dims coming in from Y corresponding
% to the iterative summation as suggested in the question
out = reshape(permute(sum(sum(parte2,3),4),[1,6,2,5,3,4]),m1*n1,m2*n2)

Approach #2: Simple 7D permute

% Get sizes
[m1,m2,~] =  size(X);
[n1,n2,N,n4,n5] =  size(Y);

% Perform kron format elementwise multiplication betwen the first two dims
% of X and Y, keeping the third dim aligned and "pushing out" leftover dims
% from Y to the back
mults = bsxfun(@times,permute(X,[4,2,5,1,3]),permute(Y,[1,6,2,7,3,4,5]));

% Lose the two dims with summation reduction for final output
out = sum(reshape(mults,m1*n1,m2*n2,[]),3);

Verification

Here's a setup for running the original and the proposed approaches -

% Setup inputs
X = rand(10,10,10);
Y = rand(10,10,10,10,10);

% Original approach
[n1,n2,N,I,J] =  size(Y);
K = zeros(100);
for a=1:N
    for i=1:I
        for j=1:J
            M = kron(X(:,:,a).',Y(:,:,a,i,j));
            K = K + M;
        end
    end
end

% Approach #1
[m1,m2,~] =  size(X);
[n1,n2,N,n4,n5] =  size(Y);
mults = bsxfun(@times,permute(X,[4,2,5,1,3]),permute(Y,[1,6,2,7,3,4,5]));
out1 = sum(reshape(mults,m1*n1,m2*n2,[]),3);

% Approach #2
[m1,m2,~] =  size(X);
[n1,n2,N,n4,n5] =  size(Y);
parte1 = reshape(permute(Y,[1,2,4,5,3]),[],N)*reshape(X,[],N).';
parte2 = reshape(parte1,[n1,n2,I,J,m1,m2]);
out2 = reshape(permute(sum(sum(parte2,3),4),[1,6,2,5,3,4]),m1*n1,m2*n2);

After running, we see the max. absolute deviation with the proposed approaches against the original one -

>> error_app1 = max(abs(K(:)-out1(:)))
error_app1 =
   1.1369e-12
>> error_app2 = max(abs(K(:)-out2(:)))
error_app2 =
   1.1937e-12

Values look good to me!

Benchmarking

Timing these three approaches using the same big dataset as used for verification, we get something like this -

----------------------------- With Loop
Elapsed time is 1.541443 seconds.
----------------------------- With BSXFUN
Elapsed time is 1.283935 seconds.
----------------------------- With MATRIX-MULTIPLICATION
Elapsed time is 0.164312 seconds.

Seems like matrix-multiplication is doing fairly good for dataset of these sizes!

edited Jul 28, 2016 at 18:06

answered Jul 28, 2016 at 17:45

Divakar

222k19 gold badges273 silver badges374 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

Mike Vandenberg Over a year ago

Just to be clear, in both new approaches, out1 and out2 replace K? If so, in the case where K = K+M is more complicated, is it possible to reflect this? For example in the original example, what if each M was normalized by a constant that changes for each iteration of the loop? A second example would if M had a boolean mask applied to it before adding to K. Would these extra steps break the output?

Divakar Over a year ago

@MikeVandenberg For ex1, as pre-processing step use : Y = bsxfun(@times,Y,permute(scale,[4,5,1,2,3])); and then use the proposed codes, where scale is the scaling matrix of size (N,I,J). For ex2, would you have a different mask for every iteration or the same one?

Mike Vandenberg Over a year ago

I guess I shouldn't have been simplistic about those steps cus they are non trivial. Here is the specific function that constructs K: pr = real(trace(E*M)); K = K + H(i,j,a)*M/pr; Where H is a known matrix before hand. So yes on each iteration, we are pulling a different slice of H and normalizing M by a different constant which is the trace of the product E*M where E is a boolean mask.

Divakar Over a year ago

@MikeVandenberg Yup, that's the thing with vectorization, you can't expect to generalize things. They work mostly on a case-by-case basis. Good luck!

Mike Vandenberg Over a year ago

Hey I've been fighting with this for the past few days and can't seem to make any headway without totally breaking the output. I specifically was trying to amend the matrix multiplication approach. Would you recommend switching the bsxfun approach instead? Regardless, I'm mostly stuck on the division by pr. How can we extract the trace of each slice without actually iterating?

|

Collectives™ on Stack Overflow

Vectorization and Nested Matrix Multiplication

1 Answer 1

7 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

7 Comments

Your Answer

Sign up or log in

Post as a guest

Related