NumPy Matrix Multiplication Efficiency for Matrix With Known Structure

Question

I have two NxN matrices that I want to multiply together: A and B. In NumPy, I used:

import numpy as np
C = np.dot(A, B)

However, I happen to know that for matrix B only row n and column n are non-zero (this comes directly from the analytical formula that produced the matrix and is without a doubt always the case).

Hoping to take advantage of this fact and reduce the number of multiplications needed to produce C, I replaced the above with:

import numpy as np
for row in range(0, N):
    for col in range(0, N):
        if col != n:
            C[row, col] = A[row, n]*B[n, col]    #Just one scalar multiplication
        else:
            C[row, col] = np.dot(A[row], B[:, n])

Analytically, this should reduce the total complexity as follows: In the general case (not using any fancy tricks, just basic matrix multiplication) C = AB, where A and B are both NxN, should be O(N^3). That is, all N rows must multiply all N columns, and each of these dot products contains N multiplications => O(NNN) = O(N^3).#

Exploiting the structure of B as I've done above however should go as O(N^2 + N^2) = O(2N^2) = O(N^2). That is, All N rows must multiply all N columns, however, for all of these (except those involving 'B[:, n]') only one scalar multiplication is required: only one element of 'B[:, m]' is non-zero for m != n. When n == m, which will occur N times (once for each row of A that must multiply column n of B), N scalar multiplications must occur.#

However, the first block of code (using np.dot(A, B)) is substantially faster. I'm aware (via information like: Why is matrix multiplication faster with numpy than with ctypes in Python?) that the low level implementation details of np.dot are likely to blame for this. So my question is this: How can I exploit the structure of matrix B to improve multiplication efficiency without sacrificing the implementation efficiency of NumPy, without building my own low level matrix multiplication in c?

This method is part of a numerical optimization over many variables, hence, O(N^3) is intractable whereas O(N^2) will likely get the job done.

Thank you for any help. Also, I'm new to SO, so please pardon any newbie errors.

Have you considered cython or some other way of compiling your multiplication function directly into machine code? In the good ole' days, I probably would have used f2py for this, but I know that not everyone wants to write code in fortran ;-) — mgilson
– mgilson, Commented Dec 9, 2013 at 0:48
I'm also not completely sure about this, but scipy might have solved some similar problem using sparse matrices. Any scipy gurus know? — mgilson
– mgilson, Commented Dec 9, 2013 at 0:50
Take a look at scipy.sparse, You can make B a sparse matrix B = scipy.sparse.csr_matrix(B) and then just do A * B, if you multiply dense by sparse the result is dense. My gut feeling is that this is more efficient by I have not tested it. — Akavall
– Akavall, Commented Dec 9, 2013 at 0:56
Thanks for the quick replies guys! Akavall, I'll look up 'scipy.sparse' First I have to confirm that A*B where B is of type scipy.sparse.csr_matrix gives the same result as np.dot(A, B) and if it's faster, then great! I'm still open to other methods in case either equality or efficiency there don't pan out. — NLi10Me
– NLi10Me, Commented Dec 9, 2013 at 1:11

behzad.nouri · Accepted Answer · 2013-12-09 01:47:35Z

6

If I understood A and B correctly, then I do not understand the for loops and why you are not just multiplying by the two non-zero vectors:

# say A & B are like this:
n, N = 3, 5
A = np.array( np.random.randn(N, N ) )

B = np.zeros_like( A )
B[ n ] = np.random.randn( N )
B[:, n] = np.random.randn( N )

take the non-zero row and column of B:

rowb, colb = B[n,:], np.copy( B[:,n] )
colb[ n ] = 0

multiply A by those two vector:

X = np.outer( A[:,n], rowb )
X[:,n] += np.dot( A, colb )

to verify check:

X - np.dot( A, B )

with N=100:

%timeit np.dot(A, B)
1000 loops, best of 3: 1.39 ms per loop

%timeit colb = np.copy( B[:,n] ); colb[ n ] = 0; X = np.outer( A[:,n], B[n,:] ); X[:,n] += np.dot( A, colb )
10000 loops, best of 3: 98.5 µs per loop

edited Dec 9, 2013 at 1:47

answered Dec 9, 2013 at 1:18

behzad.nouri

78.5k18 gold badges130 silver badges127 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

NLi10Me Over a year ago

Aha! I believe you are correct, it is not necessary for me to do the scalar multiplications manually! So, I didn't simplify the math/theory enough before implementing. Thank you for your insight!

Akavall · Accepted Answer · 2013-12-09 04:56:35Z

1

I timed it, and using sparse is faster:

import numpy as np
from scipy import sparse

from timeit import timeit

A = np.random.rand(100,100)
B = np.zeros(A.shape, dtype=np.float)

B[3] = np.random.rand(100)
B[:,3] = np.random.rand(100)

sparse_B = sparse.csr_matrix(B)

n = 1000

t1 = timeit('np.dot(A, B)', 'from __main__ import np, A, B', number=n)
print 'dense way : {}'.format(t1)
t2 = timeit('A * sparse_B', 'from __main__ import A, sparse_B',number=n)
print 'sparse way : {}'.format(t2)

Result:

dense way : 1.15117192268
sparse way : 0.113152980804
>>> np.allclose(np.dot(A, B), A * sparse_B)
True

As number of rows of B increases, so should the time advantage of multiplication using sparse matrix.

edited Dec 9, 2013 at 4:56

answered Dec 9, 2013 at 2:00

Akavall

86.8k58 gold badges214 silver badges261 bronze badges

1 Comment

NLi10Me Over a year ago

This is great thanks! I'm noticing the solution above is slightly quicker and doesn't require the extra (sparse) library, however this solution is more flexible. Also, the solution above really just points out a flaw in my implementation as opposed to a direct solution to the original problem which this is closer to. Thanks!

Collectives™ on Stack Overflow

NumPy Matrix Multiplication Efficiency for Matrix With Known Structure

2 Answers 2

1 Comment

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related