7

I'd like to create an edge list with weights as an attribute (counts number of pair occurrences - e.g., how many months have the pair a-b been together in the same group).

The dataframe contains a monthly snapshot of people in a particular team (there are no duplicates on the monthly groups)

monthyear name
jun2020 a
jun2020 b
jun2020 c
jul2020 a
jul2020 b
jul2020 d

The output should look like the following (it's non-directional so a-b pair is the same as b-a):

node1 node2 weight
a b 2
b c 1
a c 1
a d 1
b d 1

I managed to create a new dataframe with the names combinations using the following:

df1 = pd.DataFrame(data=list(combinations(df['name'].unique().tolist(), 2)), columns=['node1', 'node2'])

Now I'm not sure how to iterate over this new dataframe to populate the weights. How can this be done?

5
  • It's unclear to me. How weight is calculated? Can you show us? Commented Sep 17, 2021 at 1:51
  • Just a clarification on your output. Should there also be a node pair of A-C? Since there are A-D Commented Sep 17, 2021 at 1:52
  • Can there be duplicate values within the same month? For example 2 rows with a when monthyear=jun2020. Commented Sep 17, 2021 at 2:01
  • No duplicates on the month groupings Commented Sep 17, 2021 at 13:11
  • yes, there should be an extra a-c in the output, I'll add it! Commented Sep 17, 2021 at 13:12

1 Answer 1

3

Assuming that there are no duplicates within each monthyear group, you can get all 2-combinations of names within each group and then group by the node names to obtain the weight.

from itertools import combinations

def get_combinations(group):
    return pd.DataFrame([sorted(e) for e in list(combinations(group['name'].values, 2))], columns=['node1', 'node2'])

df = df.groupby('monthyear').apply(get_combinations)

This will give you an intermediate result:

            node1 node2
monthyear              
jul2020   0     a     b
          1     a     d
          2     b     d
jun2020   0     a     b
          1     a     c
          2     b     c

Now, calculate the weight:

df = df.groupby(['node1', 'node2']).size().to_frame('weight').reset_index()

Final result:

  node1 node2  weight
0     a     b       2
1     a     c       1
2     a     d       1
3     b     c       1
4     b     d       1
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.