2

I apologize for my very basic question but, I'm really struggling here. I need to make a recursive descent parser. I'm working in Python and using PLY. My grammar follows:

< list > → (< sequence >) | ()

< sequence > → < listelement > , < sequence > | < listelement >

< listelement > → < list > | NUMBER

Would that look something like this? Am I way off? The end goal is to read a list into a data structure and then print it out.

def p_list(p)
    'list : "("sequence")" | "("")"'

def p_sequence(p)
    'sequence : list_el","sequence | list_el'

def p_list_el(p)
    'list_el : list | NUMBER'

If anyone was wondering what the full solution was I'll post it shortly.

5
  • Does NUMBER require defining, or is it a special definition in PYR? Commented Apr 18, 2013 at 5:13
  • 1
    What's PYR? Do you mean PLY? Commented Apr 18, 2013 at 7:10
  • I can't find this supposed PYR with Google. A link to where you got it would be helpful. Although based on what I'm seeing, it really looks like you indeed mean PLY. Commented Apr 18, 2013 at 7:35
  • Sorry I do mean ply - typo Commented Apr 18, 2013 at 9:48
  • @Patashu NUMBER does require defining. Commented Apr 18, 2013 at 10:11

1 Answer 1

4

This is how I'd do it:

tokens = ("LBRACKET", "RBRACKET",
          "INTEGER", "FLOAT", "COMMA") # So we can add other tokens
t_LBRACKET = r'\('
t_RBRACKET = r'\)'
t_INTEGER = r'\d+'
t_FLOAT = r'\d+\.\d+'
t_COMMA = r','

def p_list(p):
    """list : LBRACKET sequence RBRACKET
            | LBRACKET RBRACKET"""
    if len(p) == 4:
        p[0] = p[2]
    else:
        p[0] = None

def p_number(p):
    """number : INTEGER
              | FLOAT"""
    p[0] = p[1]

def p_sequence(p):
    """sequence : list_el COMMA sequence
                | list_el"""
    if len(p) == 4:
        p[0] = p[1] + p[3]
    else:
        p[0] = p[1]        

def p_list_el(p):
    """list_el : number
               | list"""
    p[0] = p[1]

Edit:
Quick explanation on the extra tokens: Everything in a script should eventually boil down to a token or character you've defined (So it's legal to add). By specifying them all as tokens, it's easier to read and work with.

Sign up to request clarification or add additional context in comments.

7 Comments

I was just going to comment on COMMA being missing, but I see you've added it now. Good job!
Yup, I missed the comma in the original post at first. Oops!
okay this is extremely helpful. I follow almost all of it. My only question is why are you checking on len(p) == 4? Shouldn't it be on len(p) == 3 in both cases ((LBRACKET sequence RBRACKET) or (list_el COMMA sequence))?
Nevermind I've got it now. In the first case list is p[0] and in the second case sequence is p[0].
Correct- the base is the token, then the rest (p[1]+) is everything on the right. Basically.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.