1

I am trying to optimize the array multiplication of the following type in fortran:

    do i = 1, n
    do j = 1, n
    do k = 1, n
            do i1 = 1, n
            do j1 = 1, n
            do k1 = 1, n
                    B(k1,j1,i1) = B(k1,j1,i1) + &
                    A(k,j,i)*v1(k,k1)*v2(j,j1)*v3(i,i1)
            enddo
            enddo
            enddo
    enddo
    enddo
    enddo

Here A and B are of size (n,n,n) while v1, v2, and v3 are (n,n). It seems that there ought to be vectorization of some sort that would allow me to speed up this evaluation. I tried different ordering of the indices. Is there a faster way of calculating B?

EDIT:

Alright, one can use matmul to obtain about 10x speed up:

    do i = 1, n
    do i1 = 1, n
    do j = 1, n
    do j1 = 1, n
            B(:,j1,i1) = B(:,j1,i1) + &   
            matmul(A(:,j,i),v1)*v2(j,j1)*v3(i,i1)                                             
    enddo
    enddo
    enddo
    enddo

Can we do even better?

4
  • Just so I can run through some ideas in my head, roughly how big is n? Commented Jan 20, 2022 at 14:41
  • n is usually below ten, but in extreme cases, I could expect it to be even a couple of hundreds. Currently, I am trying ns around 50. Commented Jan 20, 2022 at 14:55
  • In a hurry to go out, but I suspect you can lower the order fro N^6 to N^4. Note Sum_i A(k,j,i)*v(i,i1) are independent of k1 and j1 and so can be hoisted out the loop - and doing this in turn I suspect (but I haven't convinced myself quite yet) will reduce the order of the calcs to 3 sets of quadruply nested loops, rather than 1 set of six deep. Commented Jan 20, 2022 at 15:06
  • 1
    Ahh, which is what I see @veryreverie seems to be heading towards Commented Jan 20, 2022 at 15:11

1 Answer 1

1

You can at the very least reduce two of the loop pairs to matrix multiplication, using either matmul or BLAS/LAPACK, e.g.

do i1=1,n
  do i=1,n
    B(:,:,i1) = B(:,:,i1) &
            & + matmul(matmul(transpose(v1), A(:,:,i)), v2) * v3(i,i1)
  enddo
enddo

You can almost certainly speed this up further by caching the results of the double matmul, e.g.

real :: intermediate(n,n,n)

do i=1,n
  intermediate(:,:,i) = matmul(matmul(transpose(v1), A(:,:,i)), v2)
enddo

do i1=1,n
  do i=1,n
    B(:,:,i1) = B(:,:,i1) + intermediate(:,:,i) * v3(i,i1)
  enddo
enddo

Which might in turn be sped up with more matmul, e.g.

real :: intermediate(n,n,n)

do i=1,n
  intermediate(:,:,i) = matmul(matmul(transpose(v1), A(:,:,i)), v2)
enddo

do j1=1,n
  B(:,j1,:) = matmul(intermediate(:,j1,:), v3))
enddo
Sign up to request clarification or add additional context in comments.

1 Comment

Thank you very much! This works very well. Immediately after posting the question, I realized that the matmul can help me reduce one of the loops. This trick speeds it up additional few times.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.