9

I have a big csv file which lists connections between nodes in a graph. example:

0001,95784
0001,98743
0002,00082
0002,00091

So this means that node id 0001 is connected to node 95784 and 98743 and so on. I need to read this into a sparse matrix in numpy. How can i do this? I am new to python so tutorials on this would also help.

2
  • What do you mean by '0001 is connected to 95784', in the terms of the matrix you want to have? Commented Dec 21, 2009 at 8:44
  • By this i mean that the node(id: 0001) has a directed link to node(id: 95784) Commented Dec 21, 2009 at 9:57

3 Answers 3

12

Example using lil_matrix (list of list matrix) of scipy.

Row-based linked list matrix.

This contains a list (self.rows) of rows, each of which is a sorted list of column indices of non-zero elements. It also contains a list (self.data) of lists of these elements.

$ cat 1938894-simplified.csv
0,32
1,21
1,23
1,32
2,23
2,53
2,82
3,82
4,46
5,75
7,86
8,28

Code:

#!/usr/bin/env python

import csv
from scipy import sparse

rows, columns = 10, 100
matrix = sparse.lil_matrix( (rows, columns) )

csvreader = csv.reader(open('1938894-simplified.csv'))
for line in csvreader:
    row, column = map(int, line)
    matrix.data[row].append(column)

print matrix.data

Output:

[[32] [21, 23, 32] [23, 53, 82] [82] [46] [75] [] [86] [28] []]
Sign up to request clarification or add additional context in comments.

5 Comments

Exactly what I needed. Any good resources for scipy that you can recommend?
One small question. The numbers in the csv are not the indices. they are Ids ie the file starts with 0001001,9304045 0001001,9308122 0001001,9309097 0001001,9311042 0001001,9401139 0001001,9404151 0001001,9407087 0001001,9408099 0001001,9501030 0001001,9503124 So how do i convert these IDs to numerical indices, the ID server the purpose of just identifying nodes, they may be replaced by equivalent indices if they are unique. How do I accomplish this. I know I can just make rows and columns as big as the largest ID but that seems wasteful as the nodes like with indices 0 - 1001 are wasted.
i understand your concern and i assume, there is no one best way to 'compress' your data to the relevant elements. it depends largely on your goal, what you want to do with the data later. e.g. you could use a 'mapping dictionary' which maps the actual ids to some smaller numerical values ...
If you do want to 'squeeze' your indices so that they start at 0 and go up in increments of 1 to some maximum, why not (1) sort them producing sorted_ixs (sorted_ixs = ixs; sorted_ixs.sort()), (2) zip(sorted_ixs, range(len(sorted_ixs)) producing a list of pairs matching an index with a 'squeezed index', (3) use the list as a 'translation table' from old to new indices.
Actually this will also sort ixs, I think; use sorted_ixs = ixs[:] if you want to keep your unsorted ixs around.
2

If you want an adjacency matrix, you can do something like:

from scipy.sparse import *
from scipy import *
from numpy import *
import csv
S = dok_matrix((10000,10000), dtype=bool)
f = open("your_file_name")
reader = csv.reader(f)
for line in reader:
    S[int(line[0]),int(line[1])] = True

Comments

2

You might also be interested in Networkx, a pure python network/graphing package.

From the website:

NetworkX is a Python package for the creation, manipulation, and study of the structure, dynamics, and functions of complex networks.

>>> import networkx as nx
>>> G=nx.Graph()
>>> G.add_edge(1,2)
>>> G.add_node("spam")
>>> print G.nodes()
[1, 2, 'spam']
>>> print G.edges()
[(1, 2)]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.