0

I've been developing a program that has a function that takes a string like so: ((1.(0.2)2).0) and returns True if it is a regex. Here's mmy code so far:

def is_regex(s):
    #Takes a string 's' and produces True if it is a valid regular expression
    #But False otherwise
    ex = '( + 3* + | + 6* + )'
    leaves = ['1', '2', '0', '1*', '2*', '0*']
    internal_nodes = ['|', '.']
    stripped_string = s.strip('()')

    if len(s.strip('()')) == 1:
        if '0' in s.strip('()') or '1' in s.strip('()') or '2' in s.strip('()')  or 'e' in s.strip('()'):
        return True
    elif len(s.strip('()')) == 0:
        return True
    elif stripped_string in leaves[3:6]:
        return True 
    elif len(stripped_string) == 3:
        if stripped_string[0] in leaves and stripped_string[2] in leaves:
            if stripped_string[1] in internal_nodes:
                return True 
    elif '.' in s and len(stripped_string) > 3:
        dot_position = s.rfind('.')
        if s.rfind('.') > s.rfind(')', 0, len(s)-1): #if '.' is only surrounded by one set of () then it is the root of the tree
            is_regex(s[dot_position +1])

The idea here is I want to find the root of the tree, and check if it's two children are valid regexes, if they are I move on a recurse until I'm at the leaf, if it passes all the tests until it hits the leaf then the entire regex is True.

For the last line I have there is_regex(s[dot_position +1]) I'm not getting anything returned, even though I know s[dot_position +1] returns 0 so there should be no problem. In this line I'm checking the right child of . which is 0.

Edit: Another question: I need to test whether or not the left child is true as well. How would I do this? Wouldn't I need to pass both to is_regex? Or should I check if both right and left are true then proceed? This is confusing

1
  • Okay so I just added a return statement to the last line, why does that make it work? Commented Mar 23, 2014 at 0:00

1 Answer 1

1

This is probably the most common recursion bug in existence. If you don't return the result of the recursive call, Python does not return it for you - it keeps running the function as normal. Since there's nothing else after it, it falls off the end without returning - and Python's rules mean that it will then implicitly return None. The result of is_regex(s[dot_position + 1]) is ignored, unless you explicitly return it (or otherwise use it somehow).


You have a similar bug in two of the other paths through this function:

if len(s.strip('()')) == 1:
    if '0' in s.strip('()') or '1' in s.strip('()') or '2' in s.strip('()')  or 'e' in s.strip('()'):
    return True

elif len(stripped_string) == 3:
    if stripped_string[0] in leaves and stripped_string[2] in leaves:
        if stripped_string[1] in internal_nodes:
            return True 

in both of these cases, if the outer if triggers but the inner ifs fail, you will end up returning None. This isn't as serious an issue, because None will still test false when calling code does if is_regex(not_a_regex) - but for the sake of consistency and explicitly handling these cases (instead of essentially forgetting them and having it happen to work), you probably want to return False. The easiest way to do that is to just return the boolean expressions instead of testing them:

if len(s.strip('()')) == 1:
    return '0' in s.strip('()') or '1' in s.strip('()') or '2' in s.strip('()')  or 'e' in s.strip('()')

elif len(stripped_string) == 3:
    return stripped_string[0] in leaves and stripped_string[2] in leaves and stripped_string[1] in internal_nodes

For testing the left child as well, you would indeed have to recurse into both - separately, mind (passing them into the same recursive call might be doable, but is unlikely to work). I would do it this way:

dot_position = s.rfind('.')
if s.rfind('.') > s.rfind(')', 0, len(s)-1): #if '.' is only surrounded by one set of () then it is the root of the tree
    left_child =  s[dot_position - 1]
    right child = s[dot_position + 1]
    return is_regex(left_child) and is_regex(right_child)

This is a very common pattern in any algorithm involving tree structures: test something about the value in the current node, and then test all of its subtrees by recursively calling the same routine on each child in turn; return a result that is dependent on the results of all the recursive calls. Wikipedia calls this a 'depth-first pre-order walk'.

Sign up to request clarification or add additional context in comments.

2 Comments

I see, thanks for that insight. Is there any way you could help me with my second question there? After I check the right and left children how would I go about recursing in to each of those children if they're both trees? It'd be like recursing twice simultaneously
Wow thanks, the answer seems so simple. I'm glad this dilemma is actually documented. I knew I needed a way to determine if both children were valid, thanks again, cheers!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.