I've just started coding and am trying to understand how NetworkX works. I have a Pandas DataFrame with columns of documents and topics. The topics columns indicate whether a topic is present in each document (row).
df = pd.DataFrame({'DOC': ['Doc_A', 'Doc_B', 'Doc_C', 'Doc_D', 'Doc_E'], 'topic_A': [0,0,1,0,0], 'topic_B': [1,0,0,1,0], 'topic_C': [0,1,1,1,0]})
DOC topic_A topic_B topic_C
0 Doc_A 0 1 0
1 Doc_B 0 0 1
2 Doc_C 1 0 1
3 Doc_D 0 1 1
4 Doc_E 0 0 0
What I'd like to do is create networks in which:
1) The documents are the nodes and the edges are the topics (no weight), so with multiple edges for the same node.
2) The documents are the nodes and the edges are the topics, but instead of having multiple edges, the edges are weighted based on how many subjects they share in common.
How can I do this? Am I even thinking correctly here?