3

currently I have the following portion of code:

for i = 2:N-1
  res(i) = k(i)/m(i)*x(i-1) -(c(i)+c(i+1))/m(i)*x(N+i) +e(i+1)/m(i)*x(i+1);
end

where as the variables k, m, c and e are vectors of size N and x is a vector of size 2*N. Is there any way to do this a lot faster using something like arrayfun!? I couldn't figure this out :( I especially want to make it faster by running on the GPU later and thus, arrayfun would be also helpfull since matlab doesn't support parallelizing for-loops and I don't want to buy the jacket package... Thanks a lot!

2 Answers 2

6

You don't have to use arrayfun. It works if use use some smart indexing:

    clear all

    N=20;
    k=rand(N,1);
    m=rand(N,1);
    c=rand(N,1);
    e=rand(N,1);
    x=rand(2*N,1);

    % for-based implementation
    %Watch out, you are not filling the first element of forres!
    forres=zeros(N-1,1); %Initialize array first to gain some speed.
    for i = 2:N-1
      forres(i) = k(i)/m(i)*x(i-1) -(c(i)+c(i+1))/m(i)*x(N+i) +e(i+1)/m(i)*x(i+1);
    end

    %vectorized implementation
    parres=k(2:N-1)./m(2:N-1).*x(1:N-2) -(c(2:N-1)+c(3:N))./m(2:N-1).*x(N+2:2*N-1) +e(3:N)./m(2:N-1).*x(3:N);

    %compare results; strip the first element from forres
    difference=forres(2:end)-parres %#ok<NOPTS>
Sign up to request clarification or add additional context in comments.

3 Comments

I assume a (x(i)-x(i-1))^3 gets to (x(2:N-1)-x(1:N-2)).^3?
yes, you got it. the strategy is to replace the running index i with the range of indices in the for loop. i.e. for 1:N, i becomes 1:N, i+1 becomes 1+1:N+1.
The same indexing "tricks" are basically what's required to use arrayfun. Using arrayfun on the GPU is typically much faster than a sequence of vectorized operations.
5

Firstly, MATLAB does support parallel for loops via PARFOR. However, that doesn't have much chance of speeding up this sort of computation since the amount of computation is small compared to the amount of data you're reading and writing.

To restructure things for GPUArray "arrayfun", you need to make all the array references in the loop body refer to the loop iterate, and have the loop run across the full range. You should be able to do this by offsetting some of the arrays, and padding with dummy values. For example, you could prepend all your arrays with NaN, and replace x(i-1) with a new variable x_1 = [x(2:N) NaN]

7 Comments

Okay I wanted to try it: But the problem I'm seeing here is that when creating so many 'dummy' variables, I probably run out of memory quite soon and further, the copying to the GPU and the memory access to all the different variables will probably be pretty slow!
You're quite right, you may have problems with that. Unfortunately, arrayfun is quite inflexible in what it allows. If you really wanted ultimate speed, you could write a small amount of CUDA code and use parallel.gpu.CUDAKernel in MATLAB to execute it.
Yeah thanks, I already thought about this but just wanted a 'quick' solution and what user599884 suggested, helped me to create such a quick solution to make the for-loop faster. Do you think that a CUDA written code will really speed things up compared with user599884 solution running on the GPU by simply putting x = gpuArray(x), k=gpuArray(k) etc..
Yes, on the GPU either arrayfun or a hand-written kernel will generally beat vectorized code. You can roughly estimate how much by ignoring the index offsets and using arrayfun anyway to see how fast that is.
Edric, do you think it's also worth doing it completely as CUDA-mex-file which does everything in the mex file (like copying the data to the GPU etc...) or is a PTX kernel sufficient?! I wonder if there will be a big difference between those both variants.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.