0

I have the following code:

import fileinput
import sys
import re

def reading_fasta(fasta):
    #### defining variables
    name = ""
    hashseq = {}
    sequence = ""
    ### loop over file
    for line in fileinput.input(fasta):
        if not line:
            hashseq[name] = sequence
            fileinput.close()
            break
        elif re.match("^>.*", line):
            if fileinput.lineno() != 1:
                hashseq[name] = sequence
                del sequence
            name = re.split('\W+', line)[1]
        else:
            line.rstrip("\n")
            sequence += line

reading_fasta(sys.argv[1])

with the following error:

Traceback (most recent call last):
  File "parse.py", line 25, in <module>
    reading_fasta(sys.argv[1])
  File "parse.py", line 23, in reading_fasta
    sequence += line
UnboundLocalError: local variable 'sequence' referenced before assignment

Why is that? Surfing on the net I've found same errors but because of variables were set at the global environment. Nevertheless, my variables are defined inside the function, as local variables.

1 Answer 1

2

You delete the variable at some point; this unbinds the name.

del sequence

Don't do that; rebind it to an empty string instead perhaps:

sequence = ''

Note that it is more efficient to collect strings in a list first, then concatenate in one go with str.join():

def reading_fasta(fasta):
    hashseq = {}
    with open(fasta) as inf:
        sequence = []
        name = ''
        for line in inf:
            if re.match("^>.*", line):
                if sequence:
                    hashseq[name] = ''.join(sequence)
                    sequence = []
                name = re.split('\W+', line)[1]
            else:
                sequence.append(line.rstrip('\n'))
        if sequence:
            # remainder
            hashseq[name] = ''.join(sequence)
    return hashseq

I've reworked your code a little to make use of the input file object as a context manager (so it is closed automatically). Looping over the file object will never yield entirely empty lines, the best way to detect you finished the file is by handling the remainder after the loop has completed. You don't need to use fileinput.input() here at all as you can simply test if sequence is not empty instead.

I also assumed you wanted to return the resulting dictionary. :-)

Sign up to request clarification or add additional context in comments.

1 Comment

It is amazing how a little thing can waste so much of my time... Thanks a lot!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.