4

I've used the java compiler tree api to generate the ast for java source files. However, i'm unable to access th comments in the source files.

So far, i've been unable to find a way to extract comments from source file .. is there a way using the compiler api or some other tool ?

0

5 Answers 5

5

Our SD Java Front End is a Java parser that builds ASTs (and optionally symbol tables). It captures comments directly on tree nodes.

The Java Front End is a member of a family of compiler langauge front ends (C, C++, C#, COBOL, JavaScript, ...) all of which are supported by DMS Software Reengineering Toolkit. DMS is designed to process languages for the purposes of transformation, and thus can capture comments, layout and formats to enable regeneration of code preserving the original layout as much as possible.

EDIT 3/29/2012: (in contrast to answer posted for doing this with ANTLR)

To get a comment on an AST node in DMS, one calls the DMS (lisp-like) function

  (AST:GetComments <node>)

which provide access to the array of comments associated with the AST node. One can inquire about the length of this array (may be null), or for each array element, ask for any of these properties: (AST:Get... FileIndex, Line, Column, EndLine, EndColumn, String (exact Unicode comment content).

Sign up to request clarification or add additional context in comments.

1 Comment

@Code freak - If this is helpful, you should actually give Ira Baxter an upvote, not just comment "+1".
5

The comments obtained through getCommentList method of CompilationUnit will not have the comment body. Also the comments will not be visited, during and AST Visit. Inorder to visit the comments we have call accept method for each comment in the Comment List.

for (Comment comment : (List<Comment>) compilationUnit.getCommentList()) {

    comment.accept(new CommentVisitor(compilationUnit, classSource.split("\n")));
}

The body of the comments can be obtained using some simple logic. In the below AST Visitor for comments, we need to specify the Complied class unit and the source code of the class during initialization.

import org.eclipse.jdt.core.dom.ASTNode;
import org.eclipse.jdt.core.dom.ASTVisitor;
import org.eclipse.jdt.core.dom.BlockComment;
import org.eclipse.jdt.core.dom.CompilationUnit;
import org.eclipse.jdt.core.dom.LineComment;

public class CommentVisitor extends ASTVisitor {

    CompilationUnit compilationUnit;

    private String[] source;

    public CommentVisitor(CompilationUnit compilationUnit, String[] source) {

        super();
        this.compilationUnit = compilationUnit;
        this.source = source;
    }

    public boolean visit(LineComment node) {

        int startLineNumber = compilationUnit.getLineNumber(node.getStartPosition()) - 1;
        String lineComment = source[startLineNumber].trim();

        System.out.println(lineComment);

        return true;
    }

    public boolean visit(BlockComment node) {

        int startLineNumber = compilationUnit.getLineNumber(node.getStartPosition()) - 1;
        int endLineNumber = compilationUnit.getLineNumber(node.getStartPosition() + node.getLength()) - 1;

        StringBuffer blockComment = new StringBuffer();

        for (int lineCount = startLineNumber ; lineCount<= endLineNumber; lineCount++) {

            String blockCommentLine = source[lineCount].trim();
            blockComment.append(blockCommentLine);
            if (lineCount != endLineNumber) {
                blockComment.append("\n");
            }
        }

        System.out.println(blockComment.toString());

        return true;
    }

    public void preVisit(ASTNode node) {

    }
}

Edit: Moved splitting of source out of the visitor.

5 Comments

This is really the answer? So, for each comment, you blast the source text into N lines (I suppose you could do this just once for a set) and pick out the Nth? Does that really get the comment, or just the raw line containing the comment? What happens if there are 2 comments in the same line?
You are right, the code is giving you the line which contains the comment. But you can still implement the logic to extract the comment / comments out of that raw line. Having something, is better than nothing. Also to avoid splitting the source everytime, source can be splitted once and the resultant array can be passed to the Visitor.
An arcane example: /* c1 * / foo /* c2 */ bar // c3 ... so if I have the tree for bar, how do I know which of these comments apply?
I agree this process is not effective when we have both comments and code on the same line. May be we can do another string mainpulation searching for '//' and '/*' instances in that line, but that now seems highly buggy.
Our experience is BIBSEH: "Because In Big Systems, Everything Happens". Agreed this arcane example is likely rare. But, BIBSEH, so it will occur, usually at the most inconvenient momement.
1

Just for the record. Now with Java 8 you have a whole interface to play with comments and documentation details here.

Comments

0

You might to use a different tool, like ANTLR's Java grammar. javac has no use for comments, and is likely to discard them completely. The parsers upon which tools like IDEs are built are more likely to retain comments in their AST.

2 Comments

As far as i know Netbeans uses java tree api for syntax highlighting, code completion, checking etc.
Check out its formatter. That's going to use a tree parser to correctly handle the code, but it also formats comments. Also, ANTLR has special support for comment nodes, so I'm pretty sure its Java grammar will handle this case.
0

Managed to solve the problem by using the getsourceposition() and some string manipulation (no regex was needed)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.