how to parse a file and write to an output file in python

Question

i am newbie to python. I am trying to parse a file to extract certain columns and write to an output file. I was able to parse and extract the desired columns but having trouble writing them to an output file.

Here is the original test file:

EGW05759        Pld5    I79_005987      GO_function: GO:0003824 - catalytic activity [Evidence IEA]; GO_process: GO:0008152 - metabolic process [Evidence IEA]                                  
EGW05760        Exo1    I79_005988      GO_function: GO:0003677 - DNA binding [Evidence IEA]; GO_function: GO:0003824 - catalytic activity [Evidence IEA]; GO_function: GO:0004518 - nuclease activity [Evidence IEA]; GO_process: GO:0006281 - DNA repair [Evidence IEA]

Here is my python code

f = open('test_parsing.txt', 'rU')
f1 = open('test_parsing_out.txt', 'a')
for line in f:
   match = re.search('\w+\s+(\w+)\s+\w+\s+\w+\:', line)
   match1 = re.findall('GO:\d+', line)
   f1.write(match.group(1), match1)
f1.close()

Basically i want the output to look like this (though i know my code is not complete to achieve this)

Pld5 GO:0003824:GO:0008152
Exo1 GO:0003677:GO:0003824:GO:0004518:GO:0006281

Thanks

Upendra

looks like you have a tsv file. Look into the csv python module to parse it more accurately. — Adam Smith
– Adam Smith, Commented Sep 14, 2014 at 20:13

Hasan Ramezani · Accepted Answer · 2014-09-14 20:27:11Z

4

f = open('test_parsing.txt', 'rU')
f1 = open('test_parsing_out.txt', 'a')
for line in f:
    match = re.search('\w+\s+(\w+)\s+\w+\s+\w+\:', line)
    match1 = re.findall('GO:\d+', line)
    f1.write('%s %s \n'%(match.group(1), ''.join(match1)))
f1.close()

answered Sep 14, 2014 at 20:27

Hasan Ramezani

5,21226 silver badges34 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

upendra Over a year ago

This is awesome. I am happy that my code still holds good mostly. I just made a slight edit to your code to suit the desired output. Here it is.. f1.write('%s %s \n'%(match.group(1), ','.join(match1)))

Adam Smith · Accepted Answer · 2014-09-14 20:25:03Z

2

Using the csv module:

import csv, re

with open('test_parsing.txt', 'rU') as infile, open('test_parsing_out.txt', 'a') as outfile:
    reader = csv.reader(infile, delimiter="\t")
    for line in reader:
        result = line[1] + " " + ':'.join(re.findall("GO:\d{6}", line[3]))
        outfile.write(result + "\n")

# OUTPUT
Pld5 GO:000382:GO:000815
Exo1 GO:000367:GO:000382:GO:000451:GO:000628

edited Sep 14, 2014 at 20:25

answered Sep 14, 2014 at 20:19

Adam Smith

54.6k13 gold badges85 silver badges120 bronze badges

3 Comments

upendra Over a year ago

I think there is a problem with the code here..I'm getting this error "IndexError: list index out of range". Can u please check..

Adam Smith Over a year ago

When op copy and pasted the text to SO, it converted tabs to spaces. Replace all the tabbed bits with an actual tab and it works beautifully

Adam Smith Over a year ago

@upendra re.sub(r"\s{2,}", "\t", txt) :)

Collectives™ on Stack Overflow

how to parse a file and write to an output file in python

2 Answers 2

1 Comment

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related