0

My goal to is to parse every character in the following string with the patterns I have created with PyParsing. I have two nested structures I am trying to parse. The control structure and the macro structure, and they span multiple lines.

    """
    ; Macros to verify assumptions about the data or code

    table_width: MACRO
    CURRENT_TABLE_WIDTH = \\1
    if _NARG == 2
    REDEF CURRENT_TABLE_START EQUS "\\2"
    else
    REDEF CURRENT_TABLE_START EQUS "._table_width\@"
    {CURRENT_TABLE_START}:
    endc
    ENDM
    """

These are my parsers. They work fine for parsing lines from a multi-file project up until I start trying to parse nested control and macro structures.

comment_parser = (Literal(";") + SkipTo(LineEnd()))

charmap_parser = CaselessKeyword("charmap") + QuotedString("\"") + \
                 Literal(",").suppress() + Word(hexnums + "$") + Opt(comment_parser)

expression = infix_notation(Word(printables, exclude_chars="() ** ~ + - * / % & | ^ != == <= >= < > !"),
                            [
                                ("()", 2, OpAssoc.LEFT),
                                ("**", 2, OpAssoc.LEFT),
                                (one_of("~ + -"), 1, OpAssoc.RIGHT),
                                (one_of("* / %"), 2, OpAssoc.LEFT),
                                (one_of("<< >>"), 2, OpAssoc.LEFT),
                                (one_of("& | ^"), 2, OpAssoc.LEFT),
                                ("+ -", 2, OpAssoc.LEFT),
                                ("!= == <= >= < >", 2, OpAssoc.LEFT),
                                ("&& ||", 2, OpAssoc.LEFT),
                                ("!", 1, OpAssoc.RIGHT),
                            ])

elif_parser = CaselessKeyword("elif") + expression

if_parser = CaselessKeyword("if") + expression

include_parser = CaselessLiteral("include") + QuotedString("\"") + Opt(comment_parser)
include_parser.add_parse_action(parse_include)

label = Word(printables, excludeChars=":") + Literal(":")

newcharmap_parser = CaselessKeyword("newcharmap") + Word(printables) + Opt(comment_parser)

numeric_assignment = Word(printables) + Literal("=") + Word(printables)

popc = CaselessKeyword("popc") + Opt(comment_parser)

pushc = CaselessKeyword("pushc") + Opt(comment_parser)

redef = CaselessKeyword("redef") + Word(printables) + \
        (CaselessKeyword("equ") ^ CaselessKeyword("equs")) + \
        QuotedString("\"")

all_rgbasm_parsers = Forward()

control = Forward()

macro_parser = Forward()

all_rgbasm_parsers <<= (charmap_parser ^ comment_parser ^ include_parser ^ newcharmap_parser ^
                        numeric_assignment ^ popc ^ pushc ^ redef ^ control ^ macro_parser ^ label)

control <<= if_parser + OneOrMore(all_rgbasm_parsers) + Opt(elif_parser ^ CaselessKeyword("else")) + \
    ZeroOrMore(all_rgbasm_parsers) + CaselessKeyword("endc")


macro_parser <<= Word(printables, excludeChars=":") + Literal(":").suppress() + CaselessLiteral("macro") + \
               OneOrMore(all_rgbasm_parsers) + FollowedBy(CaselessKeyword("endm"))

I expect the macro_parser to return a nested list of results from parsing the above string.

The problem is that the macro_parser does not work. I end up with Expected end of text, found 'MACRO' A very unhelpful error message.

If I remove label from all_rgbasm_parsers I get an even worse message Expected end of text, found 'table' I get the same error message when trying to parse with this

((Word(printables, excludeChars=":") + Literal(":").suppress() + CaselessLiteral("macro") +
               OneOrMore(all_rgbasm_parsers) + FollowedBy(CaselessKeyword("endm"))) ^ comment_parser)

I see nowhere in the expression above where it would expect a newline at the start of a line. I may be overlooking something. It appears that Word(printables, excludeChars=":") does not include the character _ when it parses despite the fact that string.printable includes it.

I am testing the parser with this


    test = """
    ; Macros to verify assumptions about the data or code

    table_width: MACRO
    CURRENT_TABLE_WIDTH = \\1
    if _NARG == 2
    REDEF CURRENT_TABLE_START EQUS "\\2"
    else
    REDEF CURRENT_TABLE_START EQUS "._table_width\@"
    {CURRENT_TABLE_START}:
    endc
    ENDM
    """
    from rgbasm_parsers import all_rgbasm_parsers
    all_parsers = OneOrMore(Group(all_rgbasm_parsers))
    print(all_parsers.parse_string(test, parseAll=True))

I have tested OneOrMore(Group(all_rgbasm_parsers)) with files that include no nested structures, and that gives me the correct results, so I do not think that that code is the problem, though I may be wrong.

It may be that part of the problem is that the nested structures span multiple lines, but Expected end of text, found 'table' makes me thing otherwise.

I think I might be using Forward wrong.

Any ideas?

1 Answer 1

1

Found 2 things wrong.

1st, there were some missing one_ofs in the infix_notation

expression = infix_notation(Word(
    printables,
    exclude_chars=" ** ~ + - * / % & | ^ != == <= >= < > ! , += -= *= /= %= <<= >>= &= |= ^="
),
    [
        ("**", 2, OpAssoc.LEFT),
        (one_of("~ + -"), 1, OpAssoc.RIGHT),
        (one_of("* / % *= /= %="), 2, OpAssoc.LEFT),
        (one_of("<< >> <<= >>="), 2, OpAssoc.LEFT),
        (one_of("& | ^ &= |= ^="), 2, OpAssoc.LEFT),
        (one_of("+ - += -="), 2, OpAssoc.LEFT),
        (one_of("!= == <= >= < >"), 2, OpAssoc.LEFT),
        (one_of("&& ||"), 2, OpAssoc.LEFT),
        ("!", 1, OpAssoc.RIGHT),
    ])

Then the "endm" was not being consumed resulting in a ParseExcetion.

macro_parser <<= (Word(printables, excludeChars=":") + Literal(":").suppress() + CaselessLiteral("macro") + 
                  OneOrMore(all_rgbasm_parsers) +
                  FollowedBy(CaselessKeyword("endm"))) + CaselessKeyword("endm")
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.