I am trying to optimize the array multiplication of the following type in fortran:
do i = 1, n
do j = 1, n
do k = 1, n
do i1 = 1, n
do j1 = 1, n
do k1 = 1, n
B(k1,j1,i1) = B(k1,j1,i1) + &
A(k,j,i)*v1(k,k1)*v2(j,j1)*v3(i,i1)
enddo
enddo
enddo
enddo
enddo
enddo
Here A and B are of size (n,n,n) while v1, v2, and v3 are (n,n). It seems that there ought to be vectorization of some sort that would allow me to speed up this evaluation. I tried different ordering of the indices. Is there a faster way of calculating B?
EDIT:
Alright, one can use matmul to obtain about 10x speed up:
do i = 1, n
do i1 = 1, n
do j = 1, n
do j1 = 1, n
B(:,j1,i1) = B(:,j1,i1) + &
matmul(A(:,j,i),v1)*v2(j,j1)*v3(i,i1)
enddo
enddo
enddo
enddo
Can we do even better?