0

Im new to Parsers and i have a problem with my Parser, specifically when i call itself to analize the body of a function.

When it finds another function, just became crazy and messy.

Basically, when analyzing this code

fn test (a, b, c):
    fn test2 (c, b, a):
        print("Hello world")
    end
end

It starts to point the object to itself, not the subfunction:

>>> print(ast[0].value.body[9])
<ast.VariableAST object at 0x7f6285540710>  
>>> print(ast[0].value.body[9].value.body[9])
<ast.VariableAST object at 0x7f6285540710>

This is the main parser code:

# Parser Loop
def Parser(tokenList):
    global tokens
    global size
    tokens = tokenList
    size = len(tokens)
    ast = []
    while i < size:
        ast.append(MainParser())
    return ast


# The Main Parser
def MainParser():
    global i
    if tokens[i] == 'fn':
        i += 1
        node = FunctionParser()
    else:
        node = tokens[i]
        i += 1
    return node


# Parse a function
def FunctionParser():
    global i
    checkEnd("function")
    if tokens[i][0].isalpha():
        node = VariableAST()
        node.name = tokens[i]
        i += 1
        node.value = FunctionBodyParser()
    elif tokens[i] == '(':
        node = FunctionBodyParser()
    else:
        syntaxError("Expecting '(' or function name")
    return node


# Parse a function body
def FunctionBodyParser():
    global i
    i += 1
    node = FunctionAST()
    while True:
        checkEnd("function")
        if tokens[i][0].isalpha():
            node.args.append(tokens[i])
            i += 1
        elif tokens[i] == ',':
            i += 1
        elif tokens[i] == ')':
            break
        else:
            syntaxError("Expecting ')' or ',' in function declaration")
    i += 1
    checkEnd("function")
    if tokens[i] != ':' and tokens[i] != '{':
        syntaxError("Expecting '{' or ':'")
    begin = tokens[i]
    while True:
        checkEnd("function")
        if begin == ':' and tokens[i] == 'end':
            break
        elif begin == '{' and tokens[i] == '}':
            break
        else:
            node.body.append(MainParser())
    i += 1
    return node

Edit: I forgot to mention that this is a prototype for a C version. Im avoiding stuff related to object orientation and some good pratices in python to make easier to port the code to C later.

1
  • 1
    Before going much further, and regardless of whether you are going to use a parsing library or write from scratch, write up a BNF of the language you are planning to parse. Create a series of test cases from very simple to complex, and make sure the BNF describes them properly. This will serve as a roadmap as you implement your parser, and will help you know when you are done (or at least ready for your next language extension). Commented Feb 29, 2016 at 1:38

2 Answers 2

1

There's a lot of parser implemented in Python https://wiki.python.org/moin/LanguageParsing like PyPEG allowing you to describe the language you're parsing instead of parsing it yourself, which is more readable and less error-prone.

Also using global is typically a source of problems as you can't control the scope of a variable (there's only one), reducing reusability of your functions.

It's probably better to use a class, which is almost the same thing but you can have multiple instances of the same class running at the same time without variable colisions:

class Parser:
    def __init__(self, tokenList):
        self.tokens = tokenList
        self.size = len(tokenList)
        self.ast = []
        self.position = 0

    def parse(tokenList):
        while self.position < self.size:
            self.ast.append(self.main())
        return self.ast

    def main(self):
        if self.tokens[self.position] == 'fn':
            self.position += 1
            node = self.function()
        else:
            node = self.tokens[self.position]
            self.position += 1
        return node

    # And so on...

From this point you can deduplicate self.position += 1 in main:

    def main(self):
        self.position += 1
        if self.tokens[self.position] == 'fn':
            node = self.function()
        else:
            node = self.tokens[self.position]
        return node

Then remove the useless "node" variable:

    def main(self):
        self.position += 1
        if self.tokens[self.position] == 'fn':
            return self.function()
        else:
            return self.tokens[self.position]

But the real way to do this is to use a parser, take a look a pyPEG, it's a nice one, but others are nice too.

Oh and last point, avoid useless comments like:

# Parse a function
def FunctionParser():

We know that "FunctionParser" "Parse a function", thanks, that's not an information. The most important is to wisely choose your function names (oh, the PEP8 tells us not to start method name with capitalized letters), and if you want to add meta-information about the function, put it in a string as a first statement in your function like:

def functionParser():
    "Delegates the body parsing to functionBodyParser"
Sign up to request clarification or add additional context in comments.

5 Comments

Hello my friend, I should tell this in the topic (i will edit), this is a prototype for a C version, wich i dont have access to classes or any parser builder (im kinda avoiding that because the propuse of the language). But thanks anyway, and i will use some tips that you passed me.
@h0m3 Are you seriously telling me no parser exists in C ? There's A LOT of them : en.wikipedia.org/wiki/Comparison_of_parser_generators, there's probably more parsers in C than in Python. Some (like ANTLR3) can even produce from the same description C AND Python so you can test use the same code in your Python tests and real C impl.
Of course, you're completly right. The parser i want to do myself for learning propuses and for code size (the language is pretty simple and i want to keep it smallest as possible. And i think, not at the beguinning, but when i learn more, the parser can actually be smaller than one made by a parser generator). Im trying to avoid using anything beyond the C lib.
Smaller, but slower, if you don't learn a few months / years about parsing (LR parsers, LL parsers, LALR, lookahead, PEG, and so on, so many subjects to explore here).But I agree that's an interesting subject to learn from, you'll have plenty of room for rewriting / optimizations.
Yeah. It will be pretty crappy at first. For now, im just learning the basics of LL parsers only. But for learning purpose and future optimization i think thats better to learn how to parse than just use a parse generator.
0

I found the solution,

Actually it was a silly error of programming in Python and the above parser is working fine.

In AST i was creating classes like structure to make easier to port the language to C. But i was doing it wrong, for example:

class VariableAST(NodeAST):
    name = None
    value = None

What i didnt knew about Python is that those parameters inside the class arent attributes for the object, but static variables, wich just made my program work unpredictable when i assing a value to a object, since im assing to a static variable, i also assing to all other variables and theres is my infinite recursion of objects.

1 Comment

For them to be instance variables, you have to create a method, usually def __init__(self):, and assign them as self.name = None and self.value = None. This should be covered in most Python tutorials.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.