Creating an edge list from a pandas dataframe

Question

I'd like to create an edge list with weights as an attribute (counts number of pair occurrences - e.g., how many months have the pair a-b been together in the same group).

The dataframe contains a monthly snapshot of people in a particular team (there are no duplicates on the monthly groups)

monthyear	name
jun2020	a
jun2020	b
jun2020	c
jul2020	a
jul2020	b
jul2020	d

The output should look like the following (it's non-directional so a-b pair is the same as b-a):

node1	node2	weight
a	b	2
b	c	1
a	c	1
a	d	1
b	d	1

I managed to create a new dataframe with the names combinations using the following:

df1 = pd.DataFrame(data=list(combinations(df['name'].unique().tolist(), 2)), columns=['node1', 'node2'])

Now I'm not sure how to iterate over this new dataframe to populate the weights. How can this be done?

It's unclear to me. How weight is calculated? Can you show us? — Pavan Suvarna
– Pavan Suvarna, Commented Sep 17, 2021 at 1:51
Just a clarification on your output. Should there also be a node pair of A-C? Since there are A-D — Raymond Toh
– Raymond Toh, Commented Sep 17, 2021 at 1:52
Can there be duplicate values within the same month? For example 2 rows with a when monthyear=jun2020. — Shaido
– Shaido, Commented Sep 17, 2021 at 2:01
yes, there should be an extra a-c in the output, I'll add it! — h3rmit
– h3rmit, Commented Sep 17, 2021 at 13:12

Shaido · Accepted Answer · 2021-09-17 02:11:38Z

Assuming that there are no duplicates within each monthyear group, you can get all 2-combinations of names within each group and then group by the node names to obtain the weight.

from itertools import combinations

def get_combinations(group):
    return pd.DataFrame([sorted(e) for e in list(combinations(group['name'].values, 2))], columns=['node1', 'node2'])

df = df.groupby('monthyear').apply(get_combinations)

This will give you an intermediate result:

            node1 node2
monthyear              
jul2020   0     a     b
          1     a     d
          2     b     d
jun2020   0     a     b
          1     a     c
          2     b     c

Now, calculate the weight:

df = df.groupby(['node1', 'node2']).size().to_frame('weight').reset_index()

Final result:

  node1 node2  weight
0     a     b       2
1     a     c       1
2     a     d       1
3     b     c       1
4     b     d       1

Collectives™ on Stack Overflow

Creating an edge list from a pandas dataframe

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related