0

I am using the Python3 grammar from below location,

https://github.com/antlr/grammars-v4/blob/master/python3/Python3.g4

I have the below code to to parse,

ANTLRInputStream input = new ANTLRInputStream(new FileInputStream("Functions.py"));
Python3Lexer lexer = new Python3Lexer(input);
CommonTokenStream tokens = new CommonTokenStream(lexer);
Python3Parser parser = new Python3Parser(tokens);

ParseTree tree = parser.funcdef(); //Not sure what to do here
ParseTreeWalker walker = new ParseTreeWalker();

walker.walk(new Listener(), tree);

Listener.java

public class Listener extends Python3BaseListener{
    @Override
    public void enterImport_name(Python3Parser.Import_nameContext ctx) { 
        System.out.println(ctx.getText());
    }

    @Override 
    public void enterFuncdef(Python3Parser.FuncdefContext ctx) { 
    System.out.println(ctx.getText()); //returns the whole code as string
    }   
}

I am trying to read all the imports, Variables and method names along with arguments from the python file.

How can i do this?

1 Answer 1

1

This is not a trivial problem. As a general way to write listeners, I would recommend you get code to print out the parse tree, add that to your program, and try a few different source files. Then, you can decide how to write the listeners and for what nodes.

For example, https://github.com/antlr/grammars-v4/blob/master/python3/examples/base_events.py, the first import sub-tree looks like this:

  ( stmt
    ( simple_stmt
      ( small_stmt
        ( import_stmt
          ( import_name
            ( TOKEN i=2 t=import
            ) 
            ( dotted_as_names
              ( dotted_as_name
                ( dotted_name
                  ( HIDDEN text=\ 
                  ) 
                  ( TOKEN i=4 t=collections
      ) ) ) ) ) ) ) 
      ( TOKEN i=5 t=\r\n
  ) ) ) 

You will need to look at the grammar and verify that your examples really cover the grammar. For base_events.py, import_from is not exercised (https://www.geeksforgeeks.org/import-module-python/), so you'll have to write some examples that use that syntax. Given what you said, and what I see, I'd create a listener for the dotted_as_name context, verifying that its parent is an import_stmt, then just get the first child's text. enterImport_name() is a good choice if you don't care about "import", "as", and commas also appearing in the string returned from getText().

But, I think you have the picture.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.