Optimizing array multiplication in fortran

Question

I am trying to optimize the array multiplication of the following type in fortran:

    do i = 1, n
    do j = 1, n
    do k = 1, n
            do i1 = 1, n
            do j1 = 1, n
            do k1 = 1, n
                    B(k1,j1,i1) = B(k1,j1,i1) + &
                    A(k,j,i)*v1(k,k1)*v2(j,j1)*v3(i,i1)
            enddo
            enddo
            enddo
    enddo
    enddo
    enddo

Here A and B are of size (n,n,n) while v1, v2, and v3 are (n,n). It seems that there ought to be vectorization of some sort that would allow me to speed up this evaluation. I tried different ordering of the indices. Is there a faster way of calculating B?

EDIT:

Alright, one can use matmul to obtain about 10x speed up:

    do i = 1, n
    do i1 = 1, n
    do j = 1, n
    do j1 = 1, n
            B(:,j1,i1) = B(:,j1,i1) + &   
            matmul(A(:,j,i),v1)*v2(j,j1)*v3(i,i1)                                             
    enddo
    enddo
    enddo
    enddo

Can we do even better?

Just so I can run through some ideas in my head, roughly how big is n? — Ian Bush
– Ian Bush, Commented Jan 20, 2022 at 14:41
n is usually below ten, but in extreme cases, I could expect it to be even a couple of hundreds. Currently, I am trying ns around 50. — Djole
– Djole, Commented Jan 20, 2022 at 14:55
In a hurry to go out, but I suspect you can lower the order fro N^6 to N^4. Note Sum_i A(k,j,i)*v(i,i1) are independent of k1 and j1 and so can be hoisted out the loop - and doing this in turn I suspect (but I haven't convinced myself quite yet) will reduce the order of the calcs to 3 sets of quadruply nested loops, rather than 1 set of six deep. — Ian Bush
– Ian Bush, Commented Jan 20, 2022 at 15:06
Ahh, which is what I see @veryreverie seems to be heading towards — Ian Bush
– Ian Bush, Commented Jan 20, 2022 at 15:11

veryreverie · Accepted Answer · 2022-01-20 14:15:57Z

1

You can at the very least reduce two of the loop pairs to matrix multiplication, using either matmul or BLAS/LAPACK, e.g.

do i1=1,n
  do i=1,n
    B(:,:,i1) = B(:,:,i1) &
            & + matmul(matmul(transpose(v1), A(:,:,i)), v2) * v3(i,i1)
  enddo
enddo

You can almost certainly speed this up further by caching the results of the double matmul, e.g.

real :: intermediate(n,n,n)

do i=1,n
  intermediate(:,:,i) = matmul(matmul(transpose(v1), A(:,:,i)), v2)
enddo

do i1=1,n
  do i=1,n
    B(:,:,i1) = B(:,:,i1) + intermediate(:,:,i) * v3(i,i1)
  enddo
enddo

Which might in turn be sped up with more matmul, e.g.

real :: intermediate(n,n,n)

do i=1,n
  intermediate(:,:,i) = matmul(matmul(transpose(v1), A(:,:,i)), v2)
enddo

do j1=1,n
  B(:,j1,:) = matmul(intermediate(:,j1,:), v3))
enddo

edited Jan 20, 2022 at 14:15

answered Jan 20, 2022 at 14:09

veryreverie

3,0012 gold badges16 silver badges26 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Djole Over a year ago

Thank you very much! This works very well. Immediately after posting the question, I realized that the matmul can help me reduce one of the loops. This trick speeds it up additional few times.

Collectives™ on Stack Overflow

Optimizing array multiplication in fortran

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related