2

I have a text file with each line indicating an edge on a graph, for example

2 5 1

indicates an edge of weight 1 between nodes 2 and 5. I want to create a sparse adjacency matrix using these tuples. Typically, I'd initialize a sparse matrix as

G = scipy.sparse.lil_matrix((n,n))

where n is the number of nodes in the graph. But in this case, I do not know what 'n' is. Is there a more efficient way to create the matrix than looping over the lines of the file to find the max node index, creating the lil_matrix and then again looping over the file ? My current implementation is this:

n = 0
with open(gfile) as f:
    for line in f:
        temp = map(int,line.split())
        n = np.max([n,temp[0],temp[1]])
G = sp.lil_matrix((n,n))
with open(gfile) as f:
    for line in f:
        temp = map(int,line.split())
        G[temp[0],temp[1]] = temp[2]

1 Answer 1

4

The original, and still prototypical, way of creating a sparse matrix is to collect all inputs in row, col, data arrays (or lists), and use coo_matrix to construct the matrix. Shape can be deduced from those inputs (maximum index values), or given as a parameter.

To adapt your code

row, col, data = [],[],[]
with open(gfile) as f:
    for line in f:
        temp = map(int,line.split())
        # G[temp[0],temp[1]] = temp[2]
        data.append(temp[2])
        row.append(temp[0])
        col.append(temp[1])
G = sparse.coo_matrix((data, (row,col))

List appends are at least as fast as line reads, and better than sparse matrix inserts, even lil (lil assignment involves list appends as well).

I suspect you could also do:

A = np.genfromtxt(gfile, dtype=int) # default white space delimiter
# A should now be a 2d 3 column array
G = sparse.coo_matrix((A[:,2], (A[:,0], A[:,1]))

That is read the whole file with genfromtxt or loadtxt and create the sparse matrix from the resulting columns.

(When I made sparse matrices in MATLAB years ago, I used this sort of data, col, row initialization, though with a clever use of indexing to assemble those arrays from finite element blocks without loops.)

Sign up to request clarification or add additional context in comments.

2 Comments

thanks! On your loadtxt comment: I think that loads the entire txt file, which i wanted to avoid, since some of these files can be pretty large.
New genfromtxt takes a max_rows parameter. And all versions let you feed lines via your own filter. In any case, genfromtxt isn't any faster than doing your own line read and parse. It's doing the same thing.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.