4

I have got a tree structure file,where parenthesis are used to represent the tree. Here is the code toconvert the same to a python nested list

def foo(s):
    def foo_helper(level=0):
        try:
            token = next(tokens)
        except StopIteration:
            if level != 0:
                raise Exception('missing closing paren')
            else:
                return []
        if token == ')':
            if level == 0:
                raise Exception('missing opening paren')
            else:
                return []
        elif token == '(':
            return [foo_helper(level+1)] + foo_helper(level)
        else:
            return [token] + foo_helper(level)
    tokens = iter(s)
    return foo_helper()    

as given by how to parse a strign and return nested array.

Here it works fine, when the characters are 1 in length. for words or sentences, the same doesnt properly work. My sample of tree is:

( Satellite (span 69 74) (rel2par Elaboration)
        ( Nucleus (span 69 72) (rel2par span)
          ( Nucleus (span 69 70) (rel2par span)
            ( Nucleus (leaf 69) (rel2par span) (text _!MERRILL LYNCH READY ASSETS TRUST :_!) )
            ( Satellite (leaf 70) (rel2par Elaboration) (text _!8.65 % ._!) )
          )
          ( Satellite (span 71 72) (rel2par Elaboration)
            ( Nucleus (leaf 71) (rel2par span) (text _!Annualized average rate of return_!) )
            ( Satellite (leaf 72) (rel2par Temporal) (text _!after expenses for the past 30 days ;_!) )
          )
        )
        ( Satellite (span 73 74) (rel2par Elaboration)
          ( Nucleus (leaf 73) (rel2par span) (text _!not a forecast_!) )
          ( Satellite (leaf 74) (rel2par Elaboration) (text _!of future returns ._!) )
        )
      )

here, the output need to be ['satellite',['span','69','74'].........] but with the given function what I get is ['s','a','t'...............['s','p','a','n','7','3']..............]

How can this be modified?

3
  • 1
    Have a look on this question: stackoverflow.com/questions/14058985/… Commented Apr 12, 2014 at 2:20
  • Just a quick question unrealted to the OPs question: How does foo_helper know tokens? I would've thought that a NameError would've been raised, since tokens is defined below the definition of foo_helper. Commented Apr 12, 2014 at 2:27
  • @SteinarLima Look at my answer which is formatted a bit better. It's an inner function and it's not called till the last line of foo basically. Commented Apr 12, 2014 at 2:33

2 Answers 2

1

I assumed you wanted to denote strings-with-whitespace using _!. I then split the expression using regular expressions:

from re import compile
resexp = compile(r'([()]|_!)')
…
  tokens = iter(resexp.split(s))
…

My result was (using pprint with depth=4)

$ python lispparse.py  | head
['\n',
 [' Satellite ',
  ['span 69 74'],
  ' ',
  ['rel2par Elaboration'],
  '\n        ',
  [' Nucleus ',
   ['span 69 72'],
   ' ',
   ['rel2par span'],

I refined it a little bit further with:

tokens = iter(filter(None, (i.strip() for i in resexp.split(s))))

and got:

$ python lispparse.py  
[['Satellite',
  ['span 69 74'],
  ['rel2par Elaboration'],
  ['Nucleus',
   ['span 69 72'],
   ['rel2par span'],
   ['Nucleus', [...], [...], [...], [...]],
   ['Satellite', [...], [...], [...], [...]]],
  ['Satellite',
   ['span 73 74'],
   ['rel2par Elaboration'],
   ['Nucleus', [...], [...], [...]],
   ['Satellite', [...], [...], [...]]]]]
Sign up to request clarification or add additional context in comments.

Comments

1

You are intended to call this function not on the string itself, but over a list of tokens, i.e. the string split:

def parse(s):
    def parse_helper(level=0):
        try:
            token = next(tokens)
        except StopIteration:
            if level:
                raise Exception('Missing close paren')
            else:
                return []
        if token == ')':
            if not level:
                raise Exception('Missing open paren')
            else:
                return []
        elif token == '(':
            return [parse_helper(level+1)] + parse_helper(level)
        else:
            return [token] + parse_helper(level)
    tokens = iter(s)
    return parse_helper()

if __name__ == '__main__':
    with open('tree.thing', 'r') as treefile:
        tree = treefile.read()

    print(parse(tree.split()))

Where treefile contains the data structure you posted, I get this output:

[['Satellite', '(span', '69', '74)', '(rel2par', 'Elaboration)', ['Nucleus', '(span', '69', '72)', '(rel2par', 'span)', ['Nucleus', '(span', '69', '70)', '(rel2par', 'span)', ['Nucleus', '(leaf', '69)', '(rel2par', 'span)', '(text', '_!MERRILL', 'LYNCH', 'READY', 'ASSETS', 'TRUST', ':_!)'], ['Satellite', '(leaf', '70)', '(rel2par', 'Elaboration)', '(text', '_!8.65', '%', '._!)']], ['Satellite', '(span', '71', '72)', '(rel2par', 'Elaboration)', ['Nucleus', '(leaf', '71)', '(rel2par', 'span)', '(text', '_!Annualized', 'average', 'rate', 'of', 'return_!)'], ['Satellite', '(leaf', '72)', '(rel2par', 'Temporal)', '(text', '_!after', 'expenses', 'for', 'the', 'past', '30', 'days', ';_!)']]], ['Satellite', '(span', '73', '74)', '(rel2par', 'Elaboration)', ['Nucleus', '(leaf', '73)', '(rel2par', 'span)', '(text', '_!not', 'a', 'forecast_!)'], ['Satellite', '(leaf', '74)', '(rel2par', 'Elaboration)', '(text', '_!of', 'future', 'returns', '._!)']]]]

4 Comments

Note that parse and parse_helper are just renamed from foo and foo_helper in the OP.
yes i am aware of this, but the problem is instead '_!MERRILL', 'LYNCH', 'READY', 'ASSETS', 'TRUST', I need to show it as a single item element like '_!MERRILL LYNCH READY ASSETS TRUST
Yeah I'm working on adjusting the code. Stay tuned for an update.
Ugh, a storm is interfering with my internet connection. tl;dr this code is difficult to improve and now I can't work on it, so you should look into pyparsing (or use the other answer).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.