i am newbie to python. I am trying to parse a file to extract certain columns and write to an output file. I was able to parse and extract the desired columns but having trouble writing them to an output file.
Here is the original test file:
EGW05759 Pld5 I79_005987 GO_function: GO:0003824 - catalytic activity [Evidence IEA]; GO_process: GO:0008152 - metabolic process [Evidence IEA]
EGW05760 Exo1 I79_005988 GO_function: GO:0003677 - DNA binding [Evidence IEA]; GO_function: GO:0003824 - catalytic activity [Evidence IEA]; GO_function: GO:0004518 - nuclease activity [Evidence IEA]; GO_process: GO:0006281 - DNA repair [Evidence IEA]
Here is my python code
f = open('test_parsing.txt', 'rU')
f1 = open('test_parsing_out.txt', 'a')
for line in f:
match = re.search('\w+\s+(\w+)\s+\w+\s+\w+\:', line)
match1 = re.findall('GO:\d+', line)
f1.write(match.group(1), match1)
f1.close()
Basically i want the output to look like this (though i know my code is not complete to achieve this)
Pld5 GO:0003824:GO:0008152
Exo1 GO:0003677:GO:0003824:GO:0004518:GO:0006281
Thanks
Upendra
tsvfile. Look into thecsvpython module to parse it more accurately.