What is most efficient way of setting row to zeros for a sparse scipy matrix?

Question

I'm trying to convert the following MATLAB code to Python and am having trouble finding a solution that works in any reasonable amount of time.

M = diag(sum(a)) - a;
where = vertcat(in, out);
M(where,:) = 0;
M(where,where) = 1;

Here, a is a sparse matrix and where is a vector (as are in/out). The solution I have using Python is:

M = scipy.sparse.diags([degs], [0]) - A
where = numpy.hstack((inVs, outVs)).astype(int)
M = scipy.sparse.lil_matrix(M)
M[where, :] = 0  # This is the slowest line
M[where, where] = 1
M = scipy.sparse.csc_matrix(M)

But since A is 334863x334863, this takes like three minutes. If anyone has any suggestions on how to make this faster, please contribute them! For comparison, MATLAB does this same step imperceptibly fast.

Thanks!

@DennisJaheruddin yeah it is, but assignment forcefully adds all those 0s, adding len(where)*334863 elements to the matrix. they might be later pruned with eliminate_zeros(), but operation can't be considered effective. — alko
– alko, Commented Nov 5, 2013 at 14:44
Oops, it did generate the same answer, didn't it? Apologies! That question didn't come up in any of my searches for fancy indexing in python or in my searches for matlab-like assignment for scipy sparse matrices... — Alex Reinking
– Alex Reinking, Commented Nov 5, 2013 at 17:14
Yes, I added the duplicate flag after you linked to it, @alko. Marking a question as having been answered elsewhere doesn't mean I think that Alex didn't do sufficient research! I actually do it because it benefits future visitors, who will find one common set of answers, regardless of whichever question comes up on a google or Stack Overflow search. — askewchan
– askewchan, Commented Nov 6, 2013 at 1:05

Jaime · Accepted Answer · 2013-11-06 03:51:46Z

A slightly different take on alko/seberg's approach. I find for loops disturbing, so I spent the better part of this morning figuring a way to get rid of it. The following is not always faster than the other approach. It performs better the more rows there are to be zeroed and the sparser the matrix:

def csr_zero_rows(csr, rows_to_zero):
    rows, cols = csr.shape
    mask = np.ones((rows,), dtype=np.bool)
    mask[rows_to_zero] = False
    nnz_per_row = np.diff(csr.indptr)

    mask = np.repeat(mask, nnz_per_row)
    nnz_per_row[rows_to_zero] = 0
    csr.data = csr.data[mask]
    csr.indices = csr.indices[mask]
    csr.indptr[1:] = np.cumsum(nnz_per_row)

And to test drive both approaches:

rows, cols = 334863, 334863
a = sps.rand(rows, cols, density=0.00001, format='csr')
b = a.copy()
rows_to_zero = np.random.choice(np.arange(rows), size=10000, replace=False)

In [117]: a
Out[117]: 
<334863x334863 sparse matrix of type '<type 'numpy.float64'>'
    with 1121332 stored elements in Compressed Sparse Row format>

In [118]: %timeit -n1 -r1 csr_rows_set_nz_to_val(a, rows_to_zero)
1 loops, best of 1: 75.8 ms per loop

In [119]: %timeit -n1 -r1 csr_zero_rows(b, rows_to_zero)
1 loops, best of 1: 32.5 ms per loop

And of course:

np.allclose(a.data, b.data)
Out[122]: True

np.allclose(a.indices, b.indices)
Out[123]: True

np.allclose(a.indptr, b.indptr)
Out[124]: True

Community · Accepted Answer · 2017-05-23 10:27:55Z

The solution I use for similar task attributes to @seberg and do not convert to lil format:

import scipy.sparse
import numpy
import time

def csr_row_set_nz_to_val(csr, row, value=0):
    """Set all nonzero elements (elements currently in the sparsity pattern)
    to the given value. Useful to set to 0 mostly.
    """
    if not isinstance(csr, scipy.sparse.csr_matrix):
        raise ValueError('Matrix given must be of CSR format.')
    csr.data[csr.indptr[row]:csr.indptr[row+1]] = value

def csr_rows_set_nz_to_val(csr, rows, value=0):
    for row in rows:
        csr_row_set_nz_to_val(csr, row)
    if value == 0:
        csr.eliminate_zeros()

wrap your evaluations with timing

def evaluate(size):
    degs = [1]*size
    inVs = list(xrange(1, size, size/25))
    outVs = list(xrange(5, size, size/25))
    where = numpy.hstack((inVs, outVs)).astype(int)
    start_time = time.time()
    A = scipy.sparse.csc_matrix((size, size))
    M = scipy.sparse.diags([degs], [0]) - A
    csr_rows_set_nz_to_val(M, where)
    return time.time()-start_time

and test its performance:

>>> print 'elapsed %.5f seconds' % evaluate(334863)
elapsed 0.53054 seconds

Klapaucius · Accepted Answer · 2020-12-18 22:34:13Z

1

If you dislike digging around in the guts of the sparse matrices, you can also use sparse matrix multiplication with a diagonal matrix:

def zero_rows(M, rows_to_zero):

    ixs = numpy.ones(M.shape[0], int)
    ixs[rows_to_zero] = 0
    D = sparse.diags(ixs)
    res = D * M
    return res

answered Dec 18, 2020 at 22:34

Klapaucius

111 bronze badge

Collectives™ on Stack Overflow

What is most efficient way of setting row to zeros for a sparse scipy matrix?

3 Answers 3

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related