I have two NxN matrices that I want to multiply together: A and B. In NumPy, I used:
import numpy as np
C = np.dot(A, B)
However, I happen to know that for matrix B only row n and column n are non-zero (this comes directly from the analytical formula that produced the matrix and is without a doubt always the case).
Hoping to take advantage of this fact and reduce the number of multiplications needed to produce C, I replaced the above with:
import numpy as np
for row in range(0, N):
for col in range(0, N):
if col != n:
C[row, col] = A[row, n]*B[n, col] #Just one scalar multiplication
else:
C[row, col] = np.dot(A[row], B[:, n])
Analytically, this should reduce the total complexity as follows: In the general case (not using any fancy tricks, just basic matrix multiplication) C = AB, where A and B are both NxN, should be O(N^3). That is, all N rows must multiply all N columns, and each of these dot products contains N multiplications => O(NNN) = O(N^3).#
Exploiting the structure of B as I've done above however should go as O(N^2 + N^2) = O(2N^2) = O(N^2). That is, All N rows must multiply all N columns, however, for all of these (except those involving 'B[:, n]') only one scalar multiplication is required: only one element of 'B[:, m]' is non-zero for m != n. When n == m, which will occur N times (once for each row of A that must multiply column n of B), N scalar multiplications must occur.#
However, the first block of code (using np.dot(A, B)) is substantially faster. I'm aware (via information like: Why is matrix multiplication faster with numpy than with ctypes in Python?) that the low level implementation details of np.dot are likely to blame for this. So my question is this: How can I exploit the structure of matrix B to improve multiplication efficiency without sacrificing the implementation efficiency of NumPy, without building my own low level matrix multiplication in c?
This method is part of a numerical optimization over many variables, hence, O(N^3) is intractable whereas O(N^2) will likely get the job done.
Thank you for any help. Also, I'm new to SO, so please pardon any newbie errors.
cythonor some other way of compiling your multiplication function directly into machine code? In the good ole' days, I probably would have usedf2pyfor this, but I know that not everyone wants to write code in fortran ;-)scipy.sparse, You can makeBa sparse matrixB = scipy.sparse.csr_matrix(B)and then just doA * B, if you multiply dense by sparse the result is dense. My gut feeling is that this is more efficient by I have not tested it.