Given a text file that looks like this when loaded:
>rice1 1ALBRGHAER
NNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNN
>peanuts2 2LAEKaq
SSSSSSSSSSS
>OIL3 3hkasUGSV
ppppppppppppppppppppp
ppppppppppppppppppppp
How can I extract all lines that fall between lines that contain '>' and the last lines where there is no ending '>' ?
For example, the result should look like this
result = ['NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN','SSSSSSSSSSS','pppppppppppppppppppppppppppppppppppppppppp']
I'm realizing what I did won't work because its looking for text between each new line and '>'. Running this just gives me empty strings.
def findtext(inputtextfile, start, end):
try:
pattern=rf'{start}(.*?){end}'
return re.findall(pattern, inputtextfile)
except ValueError:
return -1
result = findtext(inputtextfile,"\n", ">")
>.*\s*([^>]+)and extract the contents from group 1 of each match and store it in list