3

I have a pandas dataframe of the form:

index | id    | group
0     | abc   | A
1     | abc   | B
2     | abc   | B
3     | abc   | C
4     | def   | A
5     | def   | B
6     | ghi   | B
7     | ghi   | C

I would like to transform this to a weighted graph / adjacency matrix where nodes are the 'group', and the weights are the sum of shared ids per group pair:

The weights are the count of the group pair combinations per id, so:

AB = 'abc' indexes (0,1),(0,2) + 'def' indexes (4,5) = 3

AC = 'abc' (0,3) = 1

BC = 'abc' (2,3), (1,3) + 'ghi' (6,7) = 3

and the resulting matrix would be:

    A  |B  |C
A| 0   |3  |1
B| 3   |0  |3
C| 1   |3  |0

At the moment I am doing this very inefficiently by:

f = df.groupby(['id']).agg({'group':pd.Series.nunique}) # to count groups per id
f.loc[f['group']>1] # to get a list of the ids with >1 group

# i then for loop through the id's getting the count of values per pair (takes a long time). 

This is a first pass crude hack approach, I'm sure there must be an alternative approach using groupby or crosstab but I cant figure it out.

1
  • correct: edited Commented Mar 22, 2018 at 13:36

2 Answers 2

6

You can use the following:

df_merge = df.merge(df, on='id')
results = pd.crosstab(df_merge.group_x, df_merge.group_y)
np.fill_diagonal(results.values, 0)
results

Output:

group_y  A  B  C
group_x         
A        0  3  1
B        3  0  3
C        1  3  0

Note: the difference i your result and my result C-B and B-C three instead of two, is due to duplicate records for B-abc index row 1 and 2.

Sign up to request clarification or add additional context in comments.

1 Comment

Nice! I knew crosstab was the way to go but couldn't figure it out.
1

Maybe try dot

s=pd.crosstab(df.id,df.group)
s=s.T.dot(s)
s.values[[np.arange(len(s))]*2] = 0
s
Out[15]: 
group  A  B  C
group         
A      0  3  1
B      3  0  3
C      1  3  0

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.