Create adjacency matrix in python from csv dataset

Question

I have data that comes in the format as follows:

eventid    mnbr
20         1
26         1
12         2
14         2
15         3
14         3
10         3

eventid is an event that the member attended the data is represented as a panel so as you can see each member attends multiple events and multiple members can attend the same event. My goal is to create an adjacency matrix that shows:

 mnbr  1    2    3
 1     1    0    0
 2     0    1    1
 3     0    1    1

where there is a 1 whenever two members attend the same event. I was successfully able to read the columns of the csv file into 2 separate 1D numpy arrays. However here going forward I am unsure how to proceed. How best do I create a matrix using column 2 and how do I subsequently use column 1 to fill in the values? I understand I haven't posted any code and don't expect any solutions in that regards, but would greatly appreciate an idea of how to approach the problem in an efficient manner. I have roughly 3 million observations so creating too many external variables would be problematic. Thanks in advance. I received a notification that my question is a potential duplicate, however my problem was with parsing the data rather than creating the adjacency matrix.

do you have an estimate of how many unique members and events you have? if your arrays are called eventid and mnbr you can determine this by doing len(set(eventid)) and len(set(mnbr)) — Gabriel
– Gabriel, Commented Apr 22, 2015 at 5:16
also, you'll need to use something else besides a matrix to store your results as 3 million squared integers wont fit in memory unless you have a few thousand Gb of RAM. perhaps a sparse matrix or an adjacency list. — Gabriel
– Gabriel, Commented Apr 22, 2015 at 5:28
sorry, the above is wrong, you'll need to check that len(set(mnbr))**2 integers will fit in memory if you want to use a matrix. — Gabriel
– Gabriel, Commented Apr 22, 2015 at 5:35
@Gabriel just checked the length, I was able to use a subset for this task, I currently have 280801 observations. — thyde
– thyde, Commented Apr 22, 2015 at 20:40

Arthur Vaïsse · Accepted Answer · 2015-04-23 07:03:52Z

Here is a solution. It do not give you directly the requested adjacency matrix, but give you what you need to create it yourself.

#assume you stored every line of your input as a tuples (eventid, mnbr).
observations = [(20, 1), (26, 1), (12, 2), (14, 2), (15,3 ), (14, 3), (10, 3)]

#then creates an event link dictionary. i.e something that link every event to all its mnbrs
eventLinks = {}

for (eventid, mnbr) in observations :
    #If this event have never been encoutered then create a new entry in links
    if not eventid in eventLinks.keys():
        eventLinks[eventid] = []

    eventLinks[eventid].append(mnbr)

#collect the mnbrs
mnbrs = set([mnbr for (eventid, mnbr) in observations])

#create a member link dictionary. This one link a mnbr to other mnbr linked to it.
mnbrLinks = { mnbr : set() for mnbr in mnbrs }

for mnbrList in eventLinks.values() :
    #add for each mnbr all the mnbr implied in the same event.
    for mnbr in mnbrList:
        mnbrLinks[mnbr] = mnbrLinks[mnbr].union(set(mnbrList))

print(mnbrLinks)

Executing this code give the following result :

{1: {1}, 2: {2, 3}, 3: {2, 3}}

This is a dictionary where each mnbr have an associated set of adjacency mnbrs. This is in fact an adjacency list, that is a compressed adjacency matrix. You can expand it and build the matrix you were requesting using dictionary keys and values as row and column indexes.

Hope it help. Arthur.

EDIT : I provided an approach using adjacency list to let you implement your own adjacency matrix building. But you should consider to really use this data structure in case your data are sparse. See http://en.wikipedia.org/wiki/Adjacency_list

EDIT 2 : Add a code to convert adjacencyList to a little smart adjacencyMatrix

adjacencyList = {1: {1}, 2: {2, 3}, 3: {2, 3}}

class AdjacencyMatrix():

    def __init__(self, adjacencyList, label = ""):
        """ 
        Instanciation method of the class.
        Create an adjacency matrix from an adjacencyList.
        It is supposed that graph vertices are labeled with numbers from 1 to n.
        """

        self.matrix = []
        self.label = label

        #create an empty matrix
        for i in range(len(adjacencyList.keys())):
            self.matrix.append( [0]*(len(adjacencyList.keys())) )

        for key in adjacencyList.keys():
            for value in adjacencyList[key]:
                self[key-1][value-1] = 1

    def __str__(self):
        # return self.__repr__() is another possibility that just print the list of list
        # see python doc about difference between __str__ and __repr__

        #label first line
        string = self.label + "\t"
        for i in range(len(self.matrix)):
            string += str(i+1) + "\t"
        string += "\n"

        #for each matrix line :
        for row in range(len(self.matrix)):
            string += str(row+1) + "\t"
            for column in range(len(self.matrix)):
                string += str(self[row][column]) + "\t"
            string += "\n"


        return string

    def __repr__(self):
        return str(self.matrix)

    def __getitem__(self, index):
        """ Allow to access matrix element using matrix[index][index] syntax """
        return self.matrix.__getitem__(index)

    def __setitem__(self, index, item):
        """ Allow to set matrix element using matrix[index][index] = value syntax """
        return self.matrix.__setitem__(index, item)

    def areAdjacent(self, i, j):
        return self[i-1][j-1] == 1

m = AdjacencyMatrix(adjacencyList, label="mbr")
print(m)
print("m.areAdjacent(1,2) :",m.areAdjacent(1,2))
print("m.areAdjacent(2,3) :",m.areAdjacent(2,3))

This code give the following result :

mbr 1   2   3   
1   1   0   0   
2   0   1   1   
3   0   1   1   

m.areAdjacent(1,2) : False
m.areAdjacent(2,3) : True

Thank you so much for the help, is there anyway to directly create some of the common adjacency visualizations from this dictionary?
This is one of the common adjacency visualization ;) But yes. I'll provide an example.

Collectives™ on Stack Overflow

Create adjacency matrix in python from csv dataset

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related