creating sparse matrix of unknown size

Question

I have a text file with each line indicating an edge on a graph, for example

2 5 1

indicates an edge of weight 1 between nodes 2 and 5. I want to create a sparse adjacency matrix using these tuples. Typically, I'd initialize a sparse matrix as

G = scipy.sparse.lil_matrix((n,n))

where n is the number of nodes in the graph. But in this case, I do not know what 'n' is. Is there a more efficient way to create the matrix than looping over the lines of the file to find the max node index, creating the lil_matrix and then again looping over the file ? My current implementation is this:

n = 0
with open(gfile) as f:
    for line in f:
        temp = map(int,line.split())
        n = np.max([n,temp[0],temp[1]])
G = sp.lil_matrix((n,n))
with open(gfile) as f:
    for line in f:
        temp = map(int,line.split())
        G[temp[0],temp[1]] = temp[2]

hpaulj · Accepted Answer · 2016-08-08 04:38:30Z

4

The original, and still prototypical, way of creating a sparse matrix is to collect all inputs in row, col, data arrays (or lists), and use coo_matrix to construct the matrix. Shape can be deduced from those inputs (maximum index values), or given as a parameter.

To adapt your code

row, col, data = [],[],[]
with open(gfile) as f:
    for line in f:
        temp = map(int,line.split())
        # G[temp[0],temp[1]] = temp[2]
        data.append(temp[2])
        row.append(temp[0])
        col.append(temp[1])
G = sparse.coo_matrix((data, (row,col))

List appends are at least as fast as line reads, and better than sparse matrix inserts, even lil (lil assignment involves list appends as well).

I suspect you could also do:

A = np.genfromtxt(gfile, dtype=int) # default white space delimiter
# A should now be a 2d 3 column array
G = sparse.coo_matrix((A[:,2], (A[:,0], A[:,1]))

That is read the whole file with genfromtxt or loadtxt and create the sparse matrix from the resulting columns.

(When I made sparse matrices in MATLAB years ago, I used this sort of data, col, row initialization, though with a clever use of indexing to assemble those arrays from finite element blocks without loops.)

answered Aug 8, 2016 at 4:38

hpaulj

233k14 gold badges260 silver badges392 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

NSR Over a year ago

thanks! On your loadtxt comment: I think that loads the entire txt file, which i wanted to avoid, since some of these files can be pretty large.

hpaulj Over a year ago

New genfromtxt takes a max_rows parameter. And all versions let you feed lines via your own filter. In any case, genfromtxt isn't any faster than doing your own line read and parse. It's doing the same thing.

Collectives™ on Stack Overflow

creating sparse matrix of unknown size

1 Answer 1

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related