I have a pattern_string = 'ATAG/GAGAAGATGATG/TATA' and a query_string = 'ATAG/AGCAAGATGATG/TATA'. This works for the following regex match:
r = regex.compile('(%s){e<=2}' % pattern_string)
r.match(query_string)
Here, the only change is between the two / characters. However, I want to restrict the fuzziness of the match to only be allowed between these characters, while the characters outside of the / bounds remain an exact match.
For example, pattern_string = 'ATGG/GAGAAGATGATG/TATA' and query_string = 'ATAG/AGCAAGATGATG/TATA' is not a match, because the first part of the string (ATGG vs ATAG) does not match. Similarly, pattern_string = 'ATAG/GAGAAGATGATG/TATG' and query_string = 'ATAG/AGCAAGATGATG/TATA' is also not a match, because the last part of the string (TATG vs TATA) does not match.
In summary, the portion of the string within the / (or any delimiter character) should be allowed a fuzzy match according to what is specified to the regex ({e<=2} in this case), but the string outside must be an exact match.
How can this be achieved?
I am imagining a function like the following
ideal_function(pattern_string, query_string)
Where
ideal_function(pattern_string = 'ATAG/GAGAAGATGATG/TATA', query_string = 'ATAG/AGCAAGATGATG/TATA') returns True
ideal_function(pattern_string = 'ATGG/GAGAAGATGATG/TATA', query_string = 'ATAG/AGCAAGATGATG/TATA') returns False
The most efficient method for this would be appreciated, I have to do this on over 20,000 pattern strings with a combination of over 5 million query strings, so it needs to be as efficient as possible. It does not necessarily have to be a regex solution, though it must support the option of allowing for fuzzy match for both substitution count (as in {s<=2}) and error count (as in {e<=2}) specified.