Recursive Descent Parser using Python and PLY

Question

I apologize for my very basic question but, I'm really struggling here. I need to make a recursive descent parser. I'm working in Python and using PLY. My grammar follows:

< list > → (< sequence >) | ()

< sequence > → < listelement > , < sequence > | < listelement >

< listelement > → < list > | NUMBER

Would that look something like this? Am I way off? The end goal is to read a list into a data structure and then print it out.

def p_list(p)
    'list : "("sequence")" | "("")"'

def p_sequence(p)
    'sequence : list_el","sequence | list_el'

def p_list_el(p)
    'list_el : list | NUMBER'

If anyone was wondering what the full solution was I'll post it shortly.

Does NUMBER require defining, or is it a special definition in PYR? — Patashu
– Patashu, Commented Apr 18, 2013 at 5:13
I can't find this supposed PYR with Google. A link to where you got it would be helpful. Although based on what I'm seeing, it really looks like you indeed mean PLY. — Karl Knechtel
– Karl Knechtel, Commented Apr 18, 2013 at 7:35

TyrantWave · Accepted Answer · 2013-04-18 13:40:07Z

4

This is how I'd do it:

tokens = ("LBRACKET", "RBRACKET",
          "INTEGER", "FLOAT", "COMMA") # So we can add other tokens
t_LBRACKET = r'\('
t_RBRACKET = r'\)'
t_INTEGER = r'\d+'
t_FLOAT = r'\d+\.\d+'
t_COMMA = r','

def p_list(p):
    """list : LBRACKET sequence RBRACKET
            | LBRACKET RBRACKET"""
    if len(p) == 4:
        p[0] = p[2]
    else:
        p[0] = None

def p_number(p):
    """number : INTEGER
              | FLOAT"""
    p[0] = p[1]

def p_sequence(p):
    """sequence : list_el COMMA sequence
                | list_el"""
    if len(p) == 4:
        p[0] = p[1] + p[3]
    else:
        p[0] = p[1]        

def p_list_el(p):
    """list_el : number
               | list"""
    p[0] = p[1]

Edit:
Quick explanation on the extra tokens: Everything in a script should eventually boil down to a token or character you've defined (So it's legal to add). By specifying them all as tokens, it's easier to read and work with.

edited Apr 18, 2013 at 13:40

answered Apr 18, 2013 at 10:06

TyrantWave

4,6812 gold badges24 silver badges25 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

John Szakmeister Over a year ago

I was just going to comment on COMMA being missing, but I see you've added it now. Good job!

TyrantWave Over a year ago

Yup, I missed the comma in the original post at first. Oops!

Ryan Over a year ago

okay this is extremely helpful. I follow almost all of it. My only question is why are you checking on len(p) == 4? Shouldn't it be on len(p) == 3 in both cases ((LBRACKET sequence RBRACKET) or (list_el COMMA sequence))?

Ryan Over a year ago

Nevermind I've got it now. In the first case list is p[0] and in the second case sequence is p[0].

TyrantWave Over a year ago

Correct- the base is the token, then the rest (p[1]+) is everything on the right. Basically.

|

Collectives™ on Stack Overflow

Recursive Descent Parser using Python and PLY

1 Answer 1

7 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

7 Comments

Your Answer

Sign up or log in

Post as a guest

Related