18

I'm having trouble figuring out how to add attributes to nodes in my network from columns in my dataframe.

I have provided an example of my dataframe below, there are around 10 columns in total, but I only use the 5 columns shown below when creating my network.

Unfortunately at the moment I can only get edge attributes working with my network, I am doing this as shown below:

g = nx.from_pandas_dataframe(df, 'node_from', 'node_to', edge_attr=['attribute1','attribute2','attribute3'])

The network will be a directed network. The attributes shown in the below dataframe are the attributes for the 'node_from' nodes. The 'node_to' nodes sometimes appear as 'node_from' nodes. All the nodes that can possibly be shown in the network and their respective attributes are shown in the df_attributes_only table.

df_relationship:

node_from:  node_to: ........ attribute1:   attribute2:   attribute3:
    jim      john    ........    tall          red             fat
    ...

All of the columns have words as their values, not digits.

I also have another dataframe which has each possible node and their attributes:

df_attributes_only:

id:   attribute1:   attribute2:     attribute3:
jim      tall          red             fat
john     small         blue            fat
...

I essentially need to assign the above three attributes to their respective id, so every node has their 3 attributes attached.

Any help on how I could get node attributes working with my network is greatly appreciated.

2
  • Quick question about the attributes. Are they describing the nodes they connect or is are they in some way describing the relationship? For example, is jim tall and fat? Does that in some way describe the relationship between jim and something else? Are there instances where there could be multiple attributes for example is there another entry for jim that shows the relationship, but lists him as short and fat? Will jim have multiple relationship? Commented Feb 12, 2019 at 15:38
  • Please check my answer too, its simpler. @dataframed Commented Apr 25, 2022 at 14:14

4 Answers 4

31
+50

As of Networkx 2.0, you can input a dictionary of dictionaries into nx.set_node_attributes to set attributes for multiple nodes. This is a much more streamlined approach compared to iterating over each node manually. The outer dictionary keys represent each node, and the inner dictionaries keys correspond to the attributes you want to set for each node. Something like this:

attrs = {
    node0: {attr0: val00, attr1: val01},
    node1: {attr0: val10, attr1: val11},
    node2: {attr0: val20, attr1: val21},
}
nx.set_node_attributes(G, attrs)

You can find more detail in the documentation.


Using your example, assuming your index is id, you can convert your dataframe df_attributes_only of node attributes to this format and add to your graph:

df_attributes_only = pd.DataFrame(
    [['jim', 'tall', 'red', 'fat'], ['john', 'small', 'blue', 'fat']],
    columns=['id', 'attribute1', 'attribute2', 'attribute3']
)
node_attr = df_attributes_only.set_index('id').to_dict('index')
nx.set_node_attributes(g, node_attr)

g.nodes['jim']


>>> {'attribute1': 'tall', 'attribute2': 'red', 'attribute3': 'fat'}
Sign up to request clarification or add additional context in comments.

Comments

4

nx.from_pandas_dataframe (and from_pandas_edgelist in latest stable version 2.2), conceptually converts an edgelist to a graph. I.e., each row in the dataframe represents an edge, which is a pair of 2 different nodes.

Using this API it is not possible to read nodes' attributes. It makes sense, because each row has two different nodes and keeping specific columns for the different nodes would be cumbersome and can cause discrepancies. For example, consider the following dataframe:

node_from node_to src_attr_1 tgt_attr_1
  a         b         0         3
  a         c         2         4

What should be the 'src_attr_1' value for node a? Is it 0 or 2? Moreover, we need to keep two columns for each attribute (since it's a node attribute both of the nodes in each edge should have it). In my opinion it would be bad design to support it, and I guess that's why NetworkX API doesn't.

You can still read nodes' attributes, after converting the df to a graph, as follows:

import networkx as nx
import pandas as pd

# Build a sample dataframe (with 2 edges: 0 -> 1, 0 -> 2, node 0 has attr_1 value of 'a', node 1 has 'b', node 2 has 'c')
d = {'node_from': [0, 0], 'node_to': [1, 2], 'src_attr_1': ['a','a'], 'tgt_attr_1': ['b', 'c']}
df = pd.DataFrame(data=d)
G = nx.from_pandas_edgelist(df, 'node_from', 'node_to')

# Iterate over df rows and set the source and target nodes' attributes for each row:
for index, row in df.iterrows():
    G.nodes[row['node_from']]['attr_1'] = row['src_attr_1']
    G.nodes[row['node_to']]['attr_1'] = row['tgt_attr_1']

print(G.edges())
print(G.nodes(data=True))

Edit:

In case you want to have a large list of attributes for the source node, you can extract the dictionary of this columns automatically as follows:

#List of desired source attributes:
src_attributes = ['src_attr_1', 'src_attr_2', 'src_attr_3']

# Iterate over df rows and set source node attributes:
for index, row in df.iterrows():
    src_attr_dict = {k: row.to_dict()[k] for k in src_attributes}    
    G.nodes[row['node_from']].update(src_attr_dict)

2 Comments

Comments are not for extended discussion; this conversation has been moved to chat.
Please check my answer too, its simpler. @dataframed
0

This is building off of @zohar.kom's answer. There is a way to solve this problem without iteration. That answer can be optimized. I'm assuming that the attributes describe the node_from.

Start with a graph from an edgelist (like in @zohar.kom's anser):

 G = nx.from_pandas_edgelist(df, 'node_from', 'node_to')

You can add the nodes and attributes first.

 # Create a mask with only the first records
 mask = ~df['node_from'].duplicated()
 # Get a list of nodes with attributes
 nodes = df[mask][['node_from','attribute1','attribute2','attribute3']]

This method for adding nodes from a dataframe comes from this answer.

 # Add the attributes one at a time.
 attr_dict = nodes.set_index('node_from')['attribute1'].to_dict()
 nx.set_node_attributes(G,attr_dict,'attr1')

 attr_dict = nodes.set_index('node_from')['attribute2'].to_dict()
 nx.set_node_attributes(G,attr_dict,'attr2')

 attr_dict = nodes.set_index('node_from')['attribute3'].to_dict()
 nx.set_node_attributes(G,attr_dict,'attr3')

Similar result to @zohar.kom, but with less iterating.

5 Comments

What is the column name for node from? That is what needs to be in that column.
TiAddendum, I realize when posting this I forgot the quotation marks. I edited that in the answer.
Yes but your example does not show anywhere where the detail for node to will show up. What fields have data for nodes to?
In the example, you list both jim and john in a single row of the source data with attributes tall, red, and fat in the attribute fields. Does that mean that both jim and john have those attributes? If not what column name stores the attributes for the node_to?
Yes you would need to add data for nodes_to. No amount of code will allow you to invent data that isn't there. Once added, you would just use the same exact code substituting node_to for node_from and the new fields that describe the node_to for the old_fileds describing node_from. If you are querying the data and have ability to pull these columns, I feel like there is probably a better way to solve this whole process, but we won't have enough information to make that call.
0

Answer:

Objective: From dataframe object, generate network with nodes, edges, and node-attributes.

Lets consider, we want to generate a network with nodes and node-attributes. Each node has 3 attributes .e., attr1, attr2, and attr3.

Given a dataframe df with 1st and 2nd column as from_node and to_node respectively; and has attribute columns namely attr1, attr2, and attr3. Below code will add required edge, node, and node-attributes from dataframe.

#%%time
g = nx.Graph()

# Add edges
g = nx.from_pandas_edgelist(df_5, 'from_node','to_node')
# Iterate over df rows and set the target nodes' and node-attributes for each row:
for index, row in df.iterrows():
    g.nodes[row[0]]['attr_dict'] = row.iloc[2:].to_dict() 

list(g.edges())[0:5]
list(g.nodes(data=True))[0:5]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.