Python looping through string and matching it with with wildcard pattern

Question

string1="abc"
string2="abdabcdfg"

I want to find if string1 is substring of string2. However, there are wildcard characters like "." can be any letter, y can be "a" or "d", x can be "b" or "c". as a result, ".yx" will be substring of string2.

How can I code it using only one loop? I want to loop through string2 and make comparisons at each index. i tried dictionary but I wand to use loop my code:

def wildcard(string,substring):
    sum=""
    table={'A': '.', 'C': '.', 'G': '.', 'T': '.','A': 'x', 'T': 'x', 'C': 'y', 'G': 'y'}
    for c in strand:
        if (c in table) and table[c] not in sum:
            sum+=table[c]
        elif c not in table:
            sum+=c
    if sum==substring:
        return True
    else:
        return False

print wildcard("TTAGTTA","xyT.")#should be true

You should use dictionary and loop. In the dictionary, store what characters can be matched by each symbol, and in the loop use that to check each character. — tobias_k
– tobias_k, Commented Jul 10, 2014 at 13:08
Alternatively, translate your pattern into a regex, e.g. .+# -> [a-z][bc][ad], and then match that regex. I think this is much better, but it does not use a loop. — tobias_k
– tobias_k, Commented Jul 10, 2014 at 13:09
Could you please post the code you already wrote and describe what is wrong with it? — Stefano Sanfilippo
– Stefano Sanfilippo, Commented Jul 10, 2014 at 13:10
You can not add multiple instances of the same key into a dictionary. If You do {'A': '.', ..., 'A': 'x'}, then A maps only to x or to ., not to both. You could use 'A': '.x' or 'A': ['x','.'] instead; see my answer below (although I do the mapping the other way around). — tobias_k
– tobias_k, Commented Jul 10, 2014 at 14:17

tobias_k · Accepted Answer · 2014-07-10 14:33:49Z

1

I know you are specifically asking for a solution using a loop. However, I would suppose a different approach: You can easily translate your pattern to a regular expression. This is a similar language for string patterns, just much more powerful. You can then use the re module to check whether that regular expression (and thus your substring pattern) can be found in the string.

def to_regex(pattern, table):
    # join substitutions from table, using c itself as default
    return ''.join(table.get(c, c) for c in pattern)

import re
symbols = {'.': '[a-z]', '#': '[ad]', '+': '[bc]'}
print re.findall(to_regex('.+#', symbols), 'abdabcdfg')

If you prefer a more "hands-on" solution, you can use this, using loops.

def find_matches(pattern, table, string):
    for i in range(len(string) - len(pattern) + 1):
        # for each possible starting position, check the pattern
        for j, c in enumerate(pattern):
            if string[i+j] not in table.get(c, c):
                break # character does not match
        else:
            # loop completed without triggering the break
            yield string[i : i + len(pattern)]

symbols = {'.': 'abcdefghijklmnopqrstuvwxyz', '#': 'ad', '+': 'bc'}
print list(find_matches('.+#', symbols, 'abdabcdfg'))

Output in both cases is ['abd', 'bcd'], i.e. it can be found two times, using these substitutions.

edited Jul 10, 2014 at 14:33

answered Jul 10, 2014 at 13:23

tobias_k

83.1k12 gold badges130 silver badges186 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

tobias_k Over a year ago

From you now-added code it seems you are trying to match genetic codes, so this seems not to be homework but rather practical. Particularly in this case I would strongly recommend my first solution, as this should have much better performance.

Collectives™ on Stack Overflow

Python looping through string and matching it with with wildcard pattern

1 Answer 1

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related