1

I'm writing a toolkit in Java that uses Java expression parsing. I thought I'd try using ANTLR since

  1. It seems to be used ubiquitously for this sort of thing
  2. There don't seem to be a lot of open source alternatives
  3. I actually tried to write my own generalized parser a while back and gave up. That stuff's hard.

I have to say, after what I feel is a lot of reading and trying different things (more than I had expected to spend, anyway), ANTLR seems incredibly difficult to use. The API is very unintuitive--I'm never quite sure whether I'm calling it right.

Although ANTLR tutorials and examples abound, I haven't had luck finding any examples that involve parsing Java "expressions" -- everyone else seems to want to parse whole java files.

I started off calling it like this:

        Java8Lexer lexer = new Java8Lexer(CharStreams.fromString(text));
        CommonTokenStream tokens = new CommonTokenStream(lexer);
        Java8Parser parser = new Java8Parser(tokens);
        ParseTree result = parser.expression();

but that wouldn't parse the whole expression. E.g. with text "a.b" it would return a result that only consisted of the "a" part, just quitting after the first thing it could parse.

Fine. So I changed to:

        String input = "return " + text + ";";
        Java8Lexer lexer = new Java8Lexer(CharStreams.fromString(input));
        CommonTokenStream tokens = new CommonTokenStream(lexer);
        Java8Parser parser = new Java8Parser(tokens);
        ParseTree result = parser.returnStatement();
        result = result.getChild(1);

thinking this would force it to parse the entire expression, then I could just extract the part I cared about. That worked for name expressions like "a.b", but if I try to parse a method expression like "a.b.c(d)" it gives an error:

line 1:12 mismatched input '(' expecting '.'

Interestingly, a(), a.b(), and a.b.c parse fine, but a.b.c() also dies with the same error.

Is there an ANTLR expert here who might have an idea what I'm doing wrong?

Separately, it bothers me quite a bit that the error above is printed to stderr, but I can't find it in the result object anywhere. I'd like to be able to present that error message (vague as it is) to the user that entered the expression--they may not be looking at a console, and even if they are, there's no context there. Is there a way to find that information in the result I get back?

Any help is greatly appreciated.

1 Answer 1

2

For a rule like expression, ANTLR will stop parsing once it recognizes an expression.

You can force it to continue by adding an `EOF to you start rule.

(You don’t want to modify the actual `expressions rule, but you can add a rule like this:

expressionStart: expressions EOF;

Then you can use:

ParseTree result = parser.expressionStart();

This will force ANTLR to continue parsing you’re input until it reaches the end of you input.


re: returnStatement

When i run return a.b.c(); through the ANTLR Preview in IntelliJ, I get this parse tree:

enter image description here

A little bit of following the grammar rules, and I stumble across these rules:

typeName: Identifier | packageOrTypeName '.' Identifier;

packageOrTypeName
    : Identifier
    | packageOrTypeName '.' Identifier
    ;

That both rules include an alternative for packageOrTypeName '.' Identifier looks problematic to me.

In the tree, we see primaryNoNewArray_lfno_primary:2 which indicates a match of the second alternative in this rule:

primaryNoNewArray_lfno_primary
    : literal
    | typeName ('[' ']')* '.' 'class' // <-- trying to match this rule
    | unannPrimitiveType ('[' ']')* '.' 'class'
    | 'void' '.' 'class'
    | 'this'
    | typeName '.' 'this'
    | '(' expression ')'
    | classInstanceCreationExpression_lfno_primary
    | fieldAccess_lfno_primary
    | arrayAccess_lfno_primary
    | methodInvocation_lfno_primary
    | methodReference_lfno_primary
    ;

I'm out of time at the moment, but will keep looking at it. It seems pretty unlikely there's this obvious a bug in the Java8Parser.g4, but it certainly seems like a bug at the moment. I'm not sure what about the context would change how this is parsed (by context, meaning where returnStatement is natively called in the grammar.)

I tried this input (starting with the compilationUnit rule:

class Test {
    class A {
       public B  b;
    }
    class B {
        String c() {
            return "";
        }
    }
    String test() {
        A a = new A();
        return a.b.c();
    }
}

And it parses correctly (so, we've not found a major bug in the Java8Parser grammar 😔):

enter image description here

Still, this doesn't seem right.

Getting closer:

If I start with the block rule, and wrap in curly braces ({return a.b.c();}), it parses fine.

I'm going to go with the theory that ANTLR needs a bit more lookahead to resolve an "ambiguity".

Sign up to request clarification or add additional context in comments.

4 Comments

That's wonderful, thanks! Definitely seems to have gotten me past that. Do you know why the returnStatement() method didn't work though?
I don't see anything obvious about the returnStatement. I'll have to find a few minutes to pull things down and run a test.
I've added (quite a lot) of what I'm seeing looking into returnStatement, but it's something of a puzzle at the moment. Does the same thing result in a valid parse tree if used in a larger context?? (I just tried starting at the statement rule, and get the same "problem"
Wow, thanks for checking that out. Didn't expect you to spend that much time on it--just wondered if you knew off the top of your head. Your original answer fixed my problem. I assume it must work in a larger context, as well used as this grammar is. Like you, I assume it can't be a bug we've exposed with input so simple.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.