dict = {}
tag = ""
with open('/storage/emulated/0/Download/sequence.fasta.txt','r') as sequence:
seq = sequence.readlines()
for line in seq:
if line.startswith(">"):
tag = line.replace("\n", "")
else:
seq = "".join(seq[1:])
dict[tag] = seq.replace("\n", "")
print(dict)
Background for those who arn't familiar with FASTA files. This format contains one or multiple DNA, RNA, or protein sequences with a one-line descriptive tag of the sequence that starts with a ">" and then the sequence in the following lines(Ex. For DNA it would be a lot of repeating of A, T, G, and C). It also comes with many unnecessary line breaks. So far this code works when I only have one sequence per file but it seems to ignore the if condition if there are multiple. For example it should add each new tag: sequence pair into the dictionary everytime it notices a ">" but instead it only runs once and puts the first description as the key in the dictionary and joins the rest of the file regardless of ">" characters and uses that as the value. How can I get this loop to notice a new ">" after the first occurrence?
I am purposefully steering away from the biopython module.