3

I have a string which looks like this:

'(a (b (c d e f)) g)'

I want to turn it into such a nested list:

['a', ['b', ['c', 'd', 'e', 'f']], 'g']

I used this function:

def tree_to_list(text, left=r'[(]', right=r'[)]', sep=r','):
    pat = r'({}|{}|{})'.format(left, right, sep)
    tokens = re.split(pat, text)    
    stack = [[]]
    for x in tokens:
        if not x or re.match(sep, x): continue
        if re.match(left, x):
            stack[-1].append([])
            stack.append(stack[-1][-1])
        elif re.match(right, x):
            stack.pop()
            if not stack:
                raise ValueError('error: opening bracket is missing')
        else:
            stack[-1].append(x)
    if len(stack) > 1:
        print(stack)
        raise ValueError('error: closing bracket is missing')
    return stack.pop()

But result is not what i expected. There are no commas among strings:

['a', ['b', ['c' 'd' 'e' 'f']], 'g']

Could you please help me with that

3 Answers 3

5

You can use recursion with a generator:

import re
data = '(a (b (c d e f)) g)'
def group(d):
    a = next(d, ')')
    if a != ')':
        yield list(group(d)) if a == '(' else a
        yield from group(d)
print(next(group(iter(re.findall(r'\w+|[()]', data)))))

Output:

['a', ['b', ['c', 'd', 'e', 'f']], 'g']
Sign up to request clarification or add additional context in comments.

4 Comments

This is a good approach, but I would recommend just to iterate the string directly and add handling for whitespace, rather than using regex. +1 anyway.
this function removes dots. for example if i have words instead of letters('07.45' instead of 'a') it will turn '07.45' into '07' '45'
@EdgarZakharyan Simply adjust the regex: re.findall(r'[\w\.]+|[()]', data).
Nice, clean concise code. Care to add some comments to explain how the function works a bit more clearly?
3

Using string replacements to turn the input into the string with the desired Python value, and literal_eval to turn it into the value itself:

>>> import ast, re
>>> data = '(a (b (c d e f)) g)'
>>> s = re.sub(r'(\w+)', r'"\1"', data)         # quote words
>>> s = re.sub(r'\s+', ',', s)                  # whitespace to comma
>>> s = s.replace('(', '[').replace(')', ']')   # () -> []
>>> ast.literal_eval(s)
['a', ['b', ['c', 'd', 'e', 'f']], 'g']

Comments

1

People have suggested their own solutions, but the problem with the code you are using is that sep is set to the regex r',', which matches a single comma. Like you say, you don't use commas to separate text, you use whitespace. If you replace the default value of sep with r'\s', or call the function like tree_to_list'(a (b (c d e f)) g)', sep=r'\s'), then it works for me.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.