sp|P46531|NOTC1_HUMAN Neurogenic locus notch homolog protein 1 OS=Homo sapiens GN=NOTCH1 PE=1 SV=4 MPPLLAPLLCLALLP
I have a fasta file and I would like to search the file for the beginning of the amino acid sequence. It would be something like
aminoacids = ['A','C','D','E','F','G','H','I','K','L','M','N','P','Q','R','S','T','V','W','Y']
for filename in file_list:
with open(filename,'r') as fh:
while True:
char = fh.read(1)
if char.upper() in aminoacids:
#look for the 4 characters directly after it
but if a character is found to be in the amino acid list and the four characters next to it are also in the list, then a string will be made starting with that character and going until there are no more characters. For example, I would like to iterate through the file looking for characters. If M is found, then I would like to look for the next four characters (PPLL). If those next four characters are amino acids, then I would like to create a string starting with M and continuing to the end of the file.