1

For context: I am making a visual graph for a protein-protein interaction network. A node here corresponds to a protein and an edge would indicate interaction between two nodes.

Here is my code:

First I import all the modules and files that I need:

import networkx as nx
import matplotlib.pyplot as plt
import pandas as pd

interactome_edges = pd.read_csv("*a_directory*", delimiter = "\t", header = None)
interactome_nodes = pd.read_csv("*a_directory*", delimiter = "\t", header = None)

# A few adjustments for the dataframes
interactome_nodes = interactome_nodes.drop(columns = [0])
interactome_edges.columns = ["node1","node2"]

Dataframe for nodes looks like this:

    1
0   MET3
1   IMD3
2   OLE1
3   MUP1
4   PIS1
...

Dataframe for edges looks like this:

node1   node2
0   MET3    MET3
1   IMD3    IMD4
2   OLE1    OLE1
3   MUP1    MUP1
4   PIS1    PIS1
...

Basically the edge goes from node1 to node2

Now I iterate through each row from the node dataframe and edge dataframe and use it as networkx nodes and edges.

interactome = nx.Graph()

# Adding Nodes to Graph
for index, row in interactome_nodes.iterrows():
    interactome.add_nodes_from(row)

# Adding Edges to Graph
for index, row in interactome_edges.iterrows():
    interactome.add_edges_from(row["node1", "node2"]) #### Here is the problem

My problem is at the adding Edges part. I am currently getting the following error:

KeyError: ('node1', 'node2')

I have also tried :

for index, row in interactome_edges.iterrows():
    interactome.add_edges_from((row["node1"],row["node2"]))

and:

for index, row in interactome_edges.iterrows():
    interactome.add_edges_from(row["node1"],row["node2"])

and also simply:

for index, row in interactome_edges.iterrows():
    interactome.add_edges_from(row)

All of which give me some form of error.

How can I use my node to node dataframe as edges for a networkx graph?

1 Answer 1

5
In [9]: import networkx as nx

In [10]: import pandas as pd

In [11]: df = pd.read_csv("a.csv")

In [12]: df
Out[12]:
  node1 node2
0  MET3  MET3
1  IMD3  IMD4
2  OLE1  OLE1
3  MUP1  MUP1
4  PIS1  PIS1

In [13]: G=nx.from_pandas_edgelist(df, "node1", "node2")

In [14]: [e for e in G.edges]
Out[14]:
[('MET3', 'MET3'),
 ('IMD3', 'IMD4'),
 ('OLE1', 'OLE1'),
 ('MUP1', 'MUP1'),
 ('PIS1', 'PIS1')]

Networkx has methods to read from pandas dataframe. I have use the edge dataframe provided. Here, I'm using from_pandas_edgelist method to read from the dataframe of edges.

After plotting the graph,

nx.draw_planar(G, with_labels = True) 
plt.savefig("filename2.png") 

enter image description here

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.