13

I have a Networkx graph called G created below:

import networkx as nx
G = nx.Graph()
G.add_node(1,job= 'teacher', boss = 'dee')
G.add_node(2,job= 'teacher', boss = 'foo')
G.add_node(3,job= 'admin', boss = 'dee')
G.add_node(4,job= 'admin', boss = 'lopez')

I would like to store the node number along with attributes, job and boss in separate columns of a pandas dataframe.

I have attempted to do this with the below code but it produces a dataframe with 2 columns, 1 with node number and one with all of the attributes:

graph = G.nodes(data = True)
import pandas as pd
df = pd.DataFrame(graph)

df
Out[19]: 
    0                                      1
0  1  {u'job': u'teacher', u'boss': u'dee'}
1  2  {u'job': u'teacher', u'boss': u'foo'}
2  3    {u'job': u'admin', u'boss': u'dee'}
3  4  {u'job': u'admin', u'boss': u'lopez'}

Note: I acknowledge that NetworkX has a to_pandas_dataframe function but it does not provide a dataframe with the output I am looking for.

5 Answers 5

35

Here's a one-liner.

pd.DataFrame.from_dict(dict(graph.nodes(data=True)), orient='index')
Sign up to request clarification or add additional context in comments.

4 Comments

This is the more pythonic answer.
This does not work though if nodes have no attributes, then you get an empty DataFrame out.
@Mitar what would be the expected output for a graph with no attributes? A dataframe with only index?
Yes, ideally only index then.
6

I think this is even simpler:

pandas.DataFrame.from_dict(graph.nodes, orient='index')

Without having to convert to another dict.

2 Comments

This does not work though if nodes have no attributes, then you get an empty DataFrame out.
I know this answer came 2 years late, but it should be the accepted answer
2

I don't know how representative your data is but it should be straightforward to modify my code to work on your real network:

In [32]:
data={}
data['node']=[x[0] for x in graph]
data['boss'] = [x[1]['boss'] for x in graph]
data['job'] = [x[1]['job'] for x in graph]
df1 = pd.DataFrame(data)
df1

Out[32]:
    boss      job  node
0    dee  teacher     1
1    foo  teacher     2
2    dee    admin     3
3  lopez    admin     4

So here all I'm doing is constructing a dict from the graph data, pandas accepts dicts as data where the keys are the column names and the data has to be array-like, in this case lists of values

A more dynamic method:

In [42]:
def func(graph):
    data={}
    data['node']=[x[0] for x in graph]
    other_cols = graph[0][1].keys()
    for key in other_cols:
        data[key] = [x[1][key] for x in graph]
    return data
pd.DataFrame(func(graph))

Out[42]:
    boss      job  node
0    dee  teacher     1
1    foo  teacher     2
2    dee    admin     3
3  lopez    admin     4

3 Comments

Thank you for your solution. The only part of the solution I do not understand is the x[0] for x in graph. I understand that graph is a list but what is happening in x[0] of x in graph?
You have a list of tuples, the first element in the tuple is the node value, hence x[0] the second element is a dict x[1]
There is a mistake. It should be def func(graph):.
1

I updated this solution to work with my updated version of NetworkX (2.0) and thought I would share. I also had the function return a Pandas DataFrame.

def nodes_to_df(graph):
    import pandas as pd
    data={}
    data['node']=[x[0] for x in graph.nodes(data=True)]
    other_cols = graph.nodes[0].keys()
    for key in other_cols:
        data[key] = [x[1][key] for x in graph.nodes(data=True)]
    return pd.DataFrame(data)

Comments

0

I have solved this with a dictionary comprehension.

d = {n:dag.nodes[n] for n in dag.nodes}

df = pd.DataFrame.from_dict(d, orient='index')

Your dictionary d maps the nodes n to dag.nodes[n]. Each value of that dictionary dag.nodes[n] is a dictionary itself and contains all attributes: {attribute_name:attribute_value}

So your dictionary d has the form:

{node_id : {attribute_name : attribute_value} }

The advantage I see is that you do not need to know the names of your attributes.

If you wanted to have the node-IDs not as index but in a column, you could add as the last command:

df.reset_index(drop=False, inplace=True)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.