0
f=open('sequence3.fasta', 'r')
str=''

for line in f:
    line2=line.rstrip('\n')
    if (line2[0]!='>'):
        str=str+line2
    elif (len(line)==0):
        break

str.rstrip('\n') 
f.close()

The script is suppose to read 3 DNA sequences and connect them to one sequence. The problem is, I get this error:

IndexError: string index out of range

And when I write like this:

f=open('sequence3.fasta', 'r')
str=''

for line in f:
    line.rstrip('\n')
    if (line[0]!='>'):
        str=str+line
    elif (len(line)==0):
        break

str.rstrip('\n') 
f.close()

It runs but there are spaces in between. Thanks

1

4 Answers 4

2

The second version doesn't crash because the line line.rstrip('\n') is a NOOP. rtrip returns a new string, and doesn't modify the existing one (line). The first version crashes because probably you have empty lines in your input file so line.rstrip returns an empty line. Try this:

f=open('sequence3.fasta', 'r')
str=''

for line in f:
    line2=line.rstrip('\n')
    if line2 and line2[0]!='>':
        str=str+line2
    elif len(line)==0:
        break

if line2 is an equivalent of if len(line2) > 0. Similarly, you could replace your elif len(line)==0 with elif not line.

Sign up to request clarification or add additional context in comments.

Comments

0

Your empty line condition is in wrong place. Try:

for line in f:
    line = line.rstrip('\n')

    if len(line) == 0: # or simply: if not line:
        break

    if line[0] != '>':
        str=str+line

Or another solution is to use the .startswith: if not line.startswith('>')

3 Comments

This version still won't work. If there are empty lines within the file your loop will end prematurely.
the last line of a file still could be without '\n'
I'm not talking about the last line. Consider the following input: line1\n\nline3 - "line3" will be omitted by your solution.
0
line.rstrip('\n')

Returns copy of line, and you do nothing with it. It doesn't change "line".

Exception "IndexError: string index out of range" means that "line[0]" cannot be referenced -- so "line" must be empty. Perhaps you should make it like this:

for line in f:
    line = line.rstrip('\n')
    if line:
        if (line[0]!='>'):
            str=str+line
    else:
        break

4 Comments

If there are empty lines within the file your loop will end prematurely.
I've preserved the original algorithm (ie. I don't know what author wants to do).
You've changed too much, you didn't preserved author's intent to check for zero-length read lines, and don't break on empty lines. Refer to my answer.
I'm sure that: 1) Author's algorithm won't stop reading if empty line occurs 2) My algorithm won't stop reading if empty line occurs 3) Your change to the algorithm causes it to stop reading when empty line occurs.
0

You shouldn't use your second code example where you don't save the return value of rstrip. rstrip doesn't modify the original string that it was used on. RStrip - Return a copy of the string with trailing characters removed..

Also in your if else statement your first condition that you check should be for length 0, otherwise you'll get an error for checking past the strings length.

Additionally, having a break in your if else statements will end your loop early if you have an empty line. Instead of breaking you could just not do anything if there is 0 length.

if (len(line2) != 0):
    if (line2[0] != '>'):
        str = str+line2

Also your line near the end str.rstrip('\n') isn't doing anything since the return value of rstrip isn't saved.

8 Comments

If there are empty lines within the file your loop will end prematurely.
This was used with the OP's code as an example of attempting to fix index of of range error. I'll see about adding in more to fix that as well.
It's not about "adding more" - you've already added too much :) Refer to my answer.
It's about explaining why certain things are wrong and how to fix them which I'm attempting to do. In your answer you don't explain why the first thing you should do is check for the length. SO is all about teaching and learning, not posting code.
I'm surprised that you see only code in my answer. I explained differences between authors snippets and how is it possible that second one works, and the first one crashes. I assumed that the author understands the error and just doesn't know why it's occurring only in one of the snippets. And I fixed the snippet correctly. I'm not telling that your answer is bad - it could be better however if you posted correct code. This one fixes authors error, but introduces another error you've made. Also, the question was about "spaces" contaminating the result of the algorithm - my answer covers that.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.