4

I have a very large csv file, with a matrix like this:

null,A,B,C

A,0,2,3

B,3,4,2

C,1,2,4

It is always a n*n matrix. The first column and the first row are the names. I want to convert it to a 3 column format (also could be called edge list, long form, etc) like this:

A,A,0

A,B,2

A,C,3

B,A,3

B,B,4

B,C,2

C,A,1

C,B,2

C,C,4

I have used:

row = 0
for line in fin:
    line = line.strip("\n")
    col = 0
    tokens = line.split(",")
    for t in tokens:
        fout.write("\n%s,%s,%s"%(row,col,t))
        col += 1
    row += 1

doesn't work...

Could you please help? Thank you..

0

2 Answers 2

3

You also need to enumerate the column titles as your print out the individual cells.

For a matrix file mat.csv:

null,A,B,C
A,0,2,3
B,3,4,2
C,1,2,4

The following program:

csv = open("mat.csv")

columns = csv.readline().strip().split(',')[1:]
for line in csv:
    tokens = line.strip().split(',')
    row = tokens[0]
    for column, cell in zip(columns,tokens[1:]):
        print '{},{},{}'.format(row,column,cell)

prints out:

A,A,0
A,B,2
A,C,3
B,A,3
B,B,4
B,C,2
C,A,1
C,B,2
C,C,4

For generating the upper diagonal, you can use the following script:

csv = open("mat.csv")

columns = csv.readline().strip().split(',')[1:]
for i, line in enumerate(csv):
    tokens = line.strip().split(',')
    row = tokens[0]
    for column, cell in zip(columns[i:],tokens[i+1:]):
        print '{},{},{}'.format(row,column,cell)

which results in the output:

A,A,0
A,B,2
A,C,3
B,B,4
B,C,2
C,C,4
Sign up to request clarification or add additional context in comments.

1 Comment

Instead of col=0, set col=row
1

You need to skip the first column in each line:

for t in tokens[1:]:

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.