problem in extracting the data from text file

Question

i am new to python , and I want to extract the data from this format

FBpp0143497 5 151 5 157 PF00339.22 Arrestin_N Domain 1 135 149 83.4 1.1e-23 1 CL0135
FBpp0143497 183 323 183 324 PF02752.15 Arrestin_C Domain 1 137 138 58.5 6e-16 1 CL0135
FBpp0131987 60 280 51 280 PF00089.19 Trypsin Domain 14 219 219 127.7 3.7e-37 1 CL0124

to this format

    FBpp0143497

5 151 Arrestin_N 1.1e-23

    FBpp0143497

183 323 Arrestin_C 6e-16

I have written code in hope that it works but it does not work , please help!

file = open('/ddfs/user/data/k/ktrip_01/hmm.txt','r')
rec = file.read()   
for line in rec :
         field = line.split("\t")
         print field             
         print field[:]             
         print '>',field[0]             
         print   field[1], field[2],   field[6], field[12]

the hmmtext file is

FBpp0143497 5    151      5    157 PF00339.22  Arrestin_N        Domain     1   135   149     83.4   1.1e-23   1 CL0135   

FBpp0143497    183    323    183    324 PF02752.15  Arrestin_C        Domain     1   137   138     58.5     6e-16   1 CL0135   


FBpp0131987     60    280     51    280 PF00089.19  Trypsin           Domain    14   219   219    127.7   3.7e-37   1 CL0124

@parijat: you should've edit your original question, not post a new one. — SilentGhost
– SilentGhost, Commented May 20, 2010 at 13:12

SilentGhost · Accepted Answer · 2010-05-20 13:10:49Z

3

to iterate over a file line-by-line, you should do:

with open(fname) as file:
    for line in file:
        fields = line.split('\t')
        print(fields)          # select fields you want to print

answered May 20, 2010 at 13:10

SilentGhost

322k67 gold badges312 silver badges294 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

unutbu · Accepted Answer · 2010-05-20 13:32:51Z

1

Use the csv module to parse your tab-separated fields:

import csv
filename='/ddfs/user/data/k/ktrip_01/hmm.txt'

template='''\
> {field[0]}
{field[1]} {field[2]} {field[6]} {field[12]}'''

with open(filename,"r") as f:
    csvobj=csv.reader(f,delimiter='\t')
    for field in csvobj:
        if field:
            print(template.format(field=field))

yields:

> FBpp0143497
5 151 Arrestin_N 1.1e-23 1CL0135
> FBpp0143497
183 323 Arrestin_C 6e-1
> FBpp0131987
60 280 Trypsin 127.7

edited May 20, 2010 at 13:32

answered May 20, 2010 at 13:21

unutbu

886k197 gold badges1.9k silver badges1.7k bronze badges

Comments

Greg Hewgill · Accepted Answer · 2010-05-20 13:12:45Z

0

This line:

rec = file.read()

reads your whole file into rec, line breaks and all. You probably want to do this:

rec = file.readlines()

This is just one way to read lines from a file in Python. It's not always the best way, because this will load all the lines of the file into memory. If your input file contains, say, three million lines, it might be better to read and process each line one at a time.

answered May 20, 2010 at 13:12

Greg Hewgill

1.0m192 gold badges1.2k silver badges1.3k bronze badges

3 Comments

PARIJAT Over a year ago

thanks a lot it works, basically I am very new to the language,

PARIJAT Over a year ago

could you let me the better way to do this , as i am thankful for the better method.. as I have to work with proteome dataset

Greg Hewgill Over a year ago

@PARIJAT: The other two answers here show two different methods to process the data line by line.

Collectives™ on Stack Overflow

problem in extracting the data from text file

3 Answers 3

Comments

Comments

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related