Construct bipartite graph from columns of python dataframe

Question

I have a dataframe with three columns.

data['subdomain'],  data['domain'], data ['IP']

I want to build one bipartite graph for every element of subdomain that corresponds to the same domain, and the weight to be the number of times that it corresponds.

For example my data could be:

subdomain , domain, IP
test1, example.org, 10.20.30.40
something, site.com, 30.50.70.90
test2, example.org, 10.20.30.41
test3, example.org, 10.20.30.42
else, website.com, 90.80.70.10

I want a bipartite graph stating that example.org has a weight of 3 as it has 3 edges on it etc. And I want to group these results together into a new dataframe.

I have been trying with networkX but I have no experience especially when the edges need to be computed.

B=nx.Graph()
B.add_nodes_from(data['subdomain'],bipartite=0)
B.add_nodes_from(data['domain'],bipartite=1)
B.add_edges_from (...)

@lolkos you are probably having the wrong idea about what weights in a graph represent. they represent the strength of connection between two nodes. If you want to know the number of connections falling on a node, just find the degree of that node, and you can also easily find which nodes it shares its edges with. I hope this helps :) — hansrajswapnil
– hansrajswapnil, Commented Jun 1, 2021 at 10:18

unutbu · Accepted Answer · 2015-06-15 18:42:43Z

16

You could use

B.add_weighted_edges_from(
    [(row['domain'], row['subdomain'], 1) for idx, row in df.iterrows()], 
    weight='weight')

to add weighted edges, or you could use

B.add_edges_from(
    [(row['domain'], row['subdomain']) for idx, row in df.iterrows()])

to add edges without weights.

You may not need weights since the node degree is the number of edges adjacent to that node. For example,

>>> B.degree('example.org')
3

import pandas as pd
import networkx as nx
import matplotlib.pyplot as plt

df = pd.DataFrame(
    {'IP': ['10.20.30.40',
      '30.50.70.90',
      '10.20.30.41',
      '10.20.30.42',
      '90.80.70.10'],
     'domain': ['example.org',
      'site.com',
      'example.org',
      'example.org',
      'website.com'],
     'subdomain': ['test1', 'something', 'test2', 'test3', 'else']})

B = nx.Graph()
B.add_nodes_from(df['subdomain'], bipartite=0)
B.add_nodes_from(df['domain'], bipartite=1)
B.add_weighted_edges_from(
    [(row['domain'], row['subdomain'], 1) for idx, row in df.iterrows()], 
    weight='weight')

print(B.edges(data=True))
# [('test1', 'example.org', {'weight': 1}), ('test3', 'example.org', {'weight': 1}), ('test2', 'example.org', {'weight': 1}), ('website.com', 'else', {'weight': 1}), ('site.com', 'something', {'weight': 1})]

pos = {node:[0, i] for i,node in enumerate(df['domain'])}
pos.update({node:[1, i] for i,node in enumerate(df['subdomain'])})
nx.draw(B, pos, with_labels=False)
for p in pos:  # raise text positions
    pos[p][1] += 0.25
nx.draw_networkx_labels(B, pos)

plt.show()

yields enter image description here

edited Jun 15, 2015 at 18:42

answered Jun 15, 2015 at 18:06

unutbu

886k197 gold badges1.9k silver badges1.7k bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Iolkos Over a year ago

Yes but with this way, we define the weight to 1... I wanted something to count how many times "example.org" has been querried and then give it the weight of 3 , as the edges that go to example.org

Collectives™ on Stack Overflow

Construct bipartite graph from columns of python dataframe

1 Answer 1

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related