4

I have a CSV file that represents the adjacency matrix of a graph. However the file has as the first row the labels of the nodes and as the first column also the labels of the nodes. How can I read this file into a networkx graph object? Is there a neat pythonic way to do it without hacking around?

My trial so far:

x = np.loadtxt('file.mtx', delimiter='\t', dtype=np.str)
row_headers = x[0,:]
col_headers = x[:,0]
A = x[1:, 1:]
A = np.array(A, dtype='int')

But of course this doesn't solve the problem since I need the labels for the nodes in the graph creation.

Example of the data:

Attribute,A,B,C
A,0,1,1
B,1,0,0
C,1,0,0

A Tab is the delimiter, not a comma tho.

3
  • So these labels are duplicated in the first row and column so are redundant? You could just use pandas which will use the labels as column names and then build the graph Commented Jul 15, 2014 at 10:46
  • Can you post some data also Commented Jul 15, 2014 at 10:47
  • does this help? stackoverflow.com/questions/15009615/… Commented Jul 15, 2014 at 10:58

2 Answers 2

4

You could read the data into a structured array. The labels can be obtained from x.dtype.names, and then the networkx graph can be generated using nx.from_numpy_matrix:

import numpy as np
import networkx as nx
import matplotlib.pyplot as plt

# read the first line to determine the number of columns
with open('file.mtx', 'rb') as f:
    ncols = len(next(f).split('\t'))

x = np.genfromtxt('file.mtx', delimiter='\t', dtype=None, names=True,
                  usecols=range(1,ncols) # skip the first column
                  )
labels = x.dtype.names

# y is a view of x, so it will not require much additional memory
y = x.view(dtype=('int', len(x.dtype)))

G = nx.from_numpy_matrix(y)
G = nx.relabel_nodes(G, dict(zip(range(ncols-1), labels)))

print(G.edges(data=True))
# [('A', 'C', {'weight': 1}), ('A', 'B', {'weight': 1})]

The nx.from_numpy_matrix has a create_using parameter you can use to specify the type of networkx Graph you wish to create. For example,

G = nx.from_numpy_matrix(y, create_using=nx.DiGraph())

makes G a DiGraph.

Sign up to request clarification or add additional context in comments.

Comments

2

This would work, not sure it is the best way:

In [23]:

import pandas as pd
import io
import networkx as nx
temp = """Attribute,A,B,C
A,0,1,1
B,1,0,0
C,1,0,0"""
# for your case just load the csv like you would do, use sep='\t'
df = pd.read_csv(io.StringIO(temp))
df
Out[23]:
  Attribute  A  B  C
0         A  0  1  1
1         B  1  0  0
2         C  1  0  0

In [39]:

G = nx.DiGraph()
for col in df:
    for x in list(df.loc[df[col] == 1,'Attribute']):
        G.add_edge(col,x)

G.edges()
Out[39]:
[('C', 'A'), ('B', 'A'), ('A', 'C'), ('A', 'B')]

In [40]:

nx.draw(G)

enter image description here

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.