12

I'm trying to convert the following MATLAB code to Python and am having trouble finding a solution that works in any reasonable amount of time.

M = diag(sum(a)) - a;
where = vertcat(in, out);
M(where,:) = 0;
M(where,where) = 1;

Here, a is a sparse matrix and where is a vector (as are in/out). The solution I have using Python is:

M = scipy.sparse.diags([degs], [0]) - A
where = numpy.hstack((inVs, outVs)).astype(int)
M = scipy.sparse.lil_matrix(M)
M[where, :] = 0  # This is the slowest line
M[where, where] = 1
M = scipy.sparse.csc_matrix(M)

But since A is 334863x334863, this takes like three minutes. If anyone has any suggestions on how to make this faster, please contribute them! For comparison, MATLAB does this same step imperceptibly fast.

Thanks!

9
  • Is M still sparse after that slow step? Commented Nov 5, 2013 at 14:40
  • 3
    @DennisJaheruddin yeah it is, but assignment forcefully adds all those 0s, adding len(where)*334863 elements to the matrix. they might be later pruned with eliminate_zeros(), but operation can't be considered effective. Commented Nov 5, 2013 at 14:44
  • possible duplicate of scipy.sparse : Set row to zeros Commented Nov 5, 2013 at 14:56
  • Oops, it did generate the same answer, didn't it? Apologies! That question didn't come up in any of my searches for fancy indexing in python or in my searches for matlab-like assignment for scipy sparse matrices... Commented Nov 5, 2013 at 17:14
  • 1
    Yes, I added the duplicate flag after you linked to it, @alko. Marking a question as having been answered elsewhere doesn't mean I think that Alex didn't do sufficient research! I actually do it because it benefits future visitors, who will find one common set of answers, regardless of whichever question comes up on a google or Stack Overflow search. Commented Nov 6, 2013 at 1:05

3 Answers 3

10

A slightly different take on alko/seberg's approach. I find for loops disturbing, so I spent the better part of this morning figuring a way to get rid of it. The following is not always faster than the other approach. It performs better the more rows there are to be zeroed and the sparser the matrix:

def csr_zero_rows(csr, rows_to_zero):
    rows, cols = csr.shape
    mask = np.ones((rows,), dtype=np.bool)
    mask[rows_to_zero] = False
    nnz_per_row = np.diff(csr.indptr)

    mask = np.repeat(mask, nnz_per_row)
    nnz_per_row[rows_to_zero] = 0
    csr.data = csr.data[mask]
    csr.indices = csr.indices[mask]
    csr.indptr[1:] = np.cumsum(nnz_per_row)

And to test drive both approaches:

rows, cols = 334863, 334863
a = sps.rand(rows, cols, density=0.00001, format='csr')
b = a.copy()
rows_to_zero = np.random.choice(np.arange(rows), size=10000, replace=False)

In [117]: a
Out[117]: 
<334863x334863 sparse matrix of type '<type 'numpy.float64'>'
    with 1121332 stored elements in Compressed Sparse Row format>

In [118]: %timeit -n1 -r1 csr_rows_set_nz_to_val(a, rows_to_zero)
1 loops, best of 1: 75.8 ms per loop

In [119]: %timeit -n1 -r1 csr_zero_rows(b, rows_to_zero)
1 loops, best of 1: 32.5 ms per loop

And of course:

np.allclose(a.data, b.data)
Out[122]: True

np.allclose(a.indices, b.indices)
Out[123]: True

np.allclose(a.indptr, b.indptr)
Out[124]: True
Sign up to request clarification or add additional context in comments.

Comments

9

The solution I use for similar task attributes to @seberg and do not convert to lil format:

import scipy.sparse
import numpy
import time

def csr_row_set_nz_to_val(csr, row, value=0):
    """Set all nonzero elements (elements currently in the sparsity pattern)
    to the given value. Useful to set to 0 mostly.
    """
    if not isinstance(csr, scipy.sparse.csr_matrix):
        raise ValueError('Matrix given must be of CSR format.')
    csr.data[csr.indptr[row]:csr.indptr[row+1]] = value

def csr_rows_set_nz_to_val(csr, rows, value=0):
    for row in rows:
        csr_row_set_nz_to_val(csr, row)
    if value == 0:
        csr.eliminate_zeros()

wrap your evaluations with timing

def evaluate(size):
    degs = [1]*size
    inVs = list(xrange(1, size, size/25))
    outVs = list(xrange(5, size, size/25))
    where = numpy.hstack((inVs, outVs)).astype(int)
    start_time = time.time()
    A = scipy.sparse.csc_matrix((size, size))
    M = scipy.sparse.diags([degs], [0]) - A
    csr_rows_set_nz_to_val(M, where)
    return time.time()-start_time

and test its performance:

>>> print 'elapsed %.5f seconds' % evaluate(334863)
elapsed 0.53054 seconds

Comments

1

If you dislike digging around in the guts of the sparse matrices, you can also use sparse matrix multiplication with a diagonal matrix:

def zero_rows(M, rows_to_zero):

    ixs = numpy.ones(M.shape[0], int)
    ixs[rows_to_zero] = 0
    D = sparse.diags(ixs)
    res = D * M
    return res

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.