My code consists of 4 lists splitinputString1, splitinputString2, splitinputString3, and mainlistsplit. The list mainlistsplit is much longer as it contains all possible outcomes of the 4 letters A,C,T,&. The other 3 lists consist of predetermined 10 letter input strings that have been split into 4 letter strings.
My goal is to find 4 letter strings from the mainlistsplit that exist in each of the 3 input strings at the same time. I also have to allow for the input strings to have a 1 letter mismatch minimum. For example: ACTG in main and ACTC in one of the input strings.
I have tried the def is_close_match() but I am sure I am missing something slight in my code I am just not sure what that is.
My question is how should i go about comparing each of these string lists, finding the strings that match with at most 1 mismatch, returning, and printing them
import itertools
# Creates 3 lists, one with each of the input strings
lst = ['A', 'C', 'T', 'G', 'A', 'C', 'G', 'C', 'A', 'G']
lst2 = ['T', 'C', 'A', 'C', 'A', 'A', 'C', 'G', 'G', 'G']
lst3 = ['G', 'A', 'G', 'T', 'C', 'C', 'A', 'G', 'T', 'T']
mainlist = ['A', 'C', 'T', 'G']
mainlistsplit = [''.join(i) for i in itertools.product(mainlist, repeat=4)]
# Function to make all possible length 4 combos of mainList
# lists for the input strings when they are split
splitinputString1 = []
splitinputString2 = []
splitinputString3 = []
sequence_size = 4
# Takes the first 4 values of my lst, lst2, lst3, appends it to my split input strings, then increases the sequence by 1
for i in range(len(lst) - sequence_size + 1):
sequence = ''.join(lst[i: i + 4])
splitinputString1.append(sequence)
for i in range(len(lst2) - sequence_size + 1):
sequence = ''.join(lst2[i: i + 4])
splitinputString2.append(sequence)
for i in range(len(lst3) - sequence_size + 1):
sequence = ''.join(lst3[i: i + 4])
splitinputString3.append(sequence)
found = []
def is_close_match(mainlistsplit, s2):
mismatches = 0
for i in range(0, len(mainlistsplit)):
if mainlistsplit[i] != s2[i]:
mismatches += 1
else:
found = ''.join(s2)
if mismatches > 1:
return False
else:
return True