0

Using the following ANTLR grammar: https://github.com/bkiers/python3-parser/blob/master/src/main/antlr4/nl/bigo/pythonparser/Python3.g4 I want to parse from a given expression, lets say:

x.split(y, 3)

or

x + y

The variables x and y. How would I achieve this?

I tried the following approach but it seems cumbersome since I must add all build-in python functions:

Define a Listener interface

const listener = new MyPythonListener()
antlr.tree.ParseTreeWalker.DEFAULT.walk(listener, abstractTree)

Use regex + pattern matching:

const symbolicNames = ['TRUE', 'FALSE', 'NUMEBRS', 'STRING', 'LIST', 'TUPLE', 'DICTIONARY', 'INT', 'LONG', 'FLOAT', 'COMPLEX',
'BOOL', 'STR', 'INT', 'RANGE', 'NONE', 'LEN']

class MyPythonListener extends Python3Listener {
    variables = []

    enterExpr(ctx) {
        const text = this.getElementText(ctx)
        if (text && this.verifyIsVariable(text)) {
            this.variables.push(text)
        }
    }

    verifyIsVariable(leafText) {
        return !leafText.includes('"') && !leafText.includes('\'') && isNaN(leafText) &&
            !symbolicNames.includes(leafText.toUpperCase()) && leafText.match(/^[0-9a-zA-Z_]+$/)
    }
}
5
  • You can't use that grammar to extract variables. You can create an ANTLR grammar based on the grammar/specification you linked to and then use that ANTLR grammar to extract variables. The ANTLR grammar is most likely not a 1-to-1 translation of the specification, so there is no answer to your question without seeing the ANTLR grammar. So, could you post your ANTLR grammar? Commented Jan 20, 2021 at 14:46
  • Btw, it might be easier to use Python's own parser/ast package to retrieve such things from Python code: docs.python.org/3/library/ast.html Commented Jan 20, 2021 at 14:49
  • Thanks for responding, this is the ANTLR grammar I am using: github.com/bkiers/python3-parser - thanks for sharing this as open source Commented Jan 20, 2021 at 14:59
  • You're welcome. In the README in that repository, I link to a class that gives an example how to extract things from the parse tree. Could you edit your own question and add what you have tried yourself? Commented Jan 20, 2021 at 15:31
  • @BartKiers I edited the question and added one approach I tried using a listener + pattern matching. I also tried another variant by generating a simplified tree and getting the leaves but it doesn't look promising, any suggestion on how would you tackle such an issue is welcomed since I don't have enough experience to start something promising and I can't find appropriate guidance anywhere. Thank you! Commented Jan 21, 2021 at 10:04

1 Answer 1

1

I didn't look too closely at it, but after inspecting the parse tree for the Python code:

def some_method_name(some_param_name):
    x.split(y, 3)

it appears that the variable names are children of the atom rule:

atom
 : '(' ( yield_expr | testlist_comp )? ')' 
 | '[' testlist_comp? ']'  
 | '{' dictorsetmaker? '}' 
 | NAME 
 | number 
 | str+ 
 | '...' 
 | NONE
 | TRUE
 | FALSE
 ;

where NAME is a variable name.

So you could do something like this:

String source = "def some_method_name(some_param_name):\n    x.split(y, 3)\n";
Python3Lexer lexer = new Python3Lexer(CharStreams.fromString(source));
Python3Parser parser = new Python3Parser(new CommonTokenStream(lexer));

ParseTreeWalker.DEFAULT.walk(new Python3BaseListener() {
    @Override
    public void enterAtom(Python3Parser.AtomContext ctx) {
        if (ctx.NAME() != null) {
            System.out.println(ctx.NAME().getText());
        }
    }
}, parser.file_input());

which will print:

x
y

and not the method and parameter names.

Again: not thoroughly tested, I leave that for you. You can pretty print the parse tree like this:

String source = "def some_method_name(some_param_name):\n    x.split(y, 3)\n";
Python3Lexer lexer = new Python3Lexer(CharStreams.fromString(source));
Python3Parser parser = new Python3Parser(new CommonTokenStream(lexer));

System.out.println(new Builder.Tree(source).toStringASCII());

to inspect for yourself where the nodes you're intereseted in occur in the parse tree. The code above will print:

'- file_input
   |- stmt
   |  '- compound_stmt
   |     '- funcdef
   |        |- def
   |        |- some_method_name
   |        |- parameters
   |        |  |- (
   |        |  |- typedargslist
   |        |  |  '- tfpdef
   |        |  |     '- some_param
   |        |  '- )
   |        |- :
   |        '- suite
   |           |- <NEWLINE>
   |           |- <INDENT>
   |           |- stmt
   |           |  '- simple_stmt
   |           |     |- small_stmt
   |           |     |  '- expr_stmt
   |           |     |     '- testlist_star_expr
   |           |     |        '- test
   |           |     |           '- or_test
   |           |     |              '- and_test
   |           |     |                 '- not_test
   |           |     |                    '- comparison
   |           |     |                       '- star_expr
   |           |     |                          '- expr
   |           |     |                             '- xor_expr
   |           |     |                                '- and_expr
   |           |     |                                   '- shift_expr
   |           |     |                                      '- arith_expr
   |           |     |                                         '- term
   |           |     |                                            '- factor
   |           |     |                                               '- power
   |           |     |                                                  |- atom
   |           |     |                                                  |  '- x
   |           |     |                                                  |- trailer
   |           |     |                                                  |  |- .
   |           |     |                                                  |  '- split
   |           |     |                                                  '- trailer
   |           |     |                                                     |- (
   |           |     |                                                     |- arglist
   |           |     |                                                     |  |- argument
   |           |     |                                                     |  |  '- test
   |           |     |                                                     |  |     '- or_test
   |           |     |                                                     |  |        '- and_test
   |           |     |                                                     |  |           '- not_test
   |           |     |                                                     |  |              '- comparison
   |           |     |                                                     |  |                 '- star_expr
   |           |     |                                                     |  |                    '- expr
   |           |     |                                                     |  |                       '- xor_expr
   |           |     |                                                     |  |                          '- and_expr
   |           |     |                                                     |  |                             '- shift_expr
   |           |     |                                                     |  |                                '- arith_expr
   |           |     |                                                     |  |                                   '- term
   |           |     |                                                     |  |                                      '- factor
   |           |     |                                                     |  |                                         '- power
   |           |     |                                                     |  |                                            '- atom
   |           |     |                                                     |  |                                               '- y
   |           |     |                                                     |  |- ,
   |           |     |                                                     |  '- argument
   |           |     |                                                     |     '- test
   |           |     |                                                     |        '- or_test
   |           |     |                                                     |           '- and_test
   |           |     |                                                     |              '- not_test
   |           |     |                                                     |                 '- comparison
   |           |     |                                                     |                    '- star_expr
   |           |     |                                                     |                       '- expr
   |           |     |                                                     |                          '- xor_expr
   |           |     |                                                     |                             '- and_expr
   |           |     |                                                     |                                '- shift_expr
   |           |     |                                                     |                                   '- arith_expr
   |           |     |                                                     |                                      '- term
   |           |     |                                                     |                                         '- factor
   |           |     |                                                     |                                            '- power
   |           |     |                                                     |                                               '- atom
   |           |     |                                                     |                                                  '- number
   |           |     |                                                     |                                                     '- integer
   |           |     |                                                     |                                                        '- 3
   |           |     |                                                     '- )
   |           |     '- <NEWLINE>
   |           '- <DEDENT>
   '- <EOF>

Note that the Builder.Tree class is not part of the ANTLR library, it resides in the/my repo you linked to in your question: https://github.com/bkiers/python3-parser/blob/master/src/main/java/nl/bigo/pythonparser/Builder.java

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.