0

I am currently developing a corrector for java in my text editor. To do so I think the best way is to use Pattern to look for element of java syntax (import or package declaration, class or method declaration...). I have already written some of these pattern:

private String regimport = "^import(\\s+)(static |)(\\w+\\.)*(\\w+)(\\s*);(\\s*)$",
                regpackage="^package(\\s+)[\\w+\\.]*[\\w+](\\s*);(\\s*)$",
                regclass="^((public(\\s+)abstract)|(abstract)|(public)|(final)|(public(\\s+)final)|)(\\s+)class(\\s+)(\\w+)(((\\s+)(extends|implements)(\\s+)(\\w+))|)(\\s*)(\\{)?(\\s*)$";

It's not very difficult for now but I am afraid it will take a long time to achieve it. Does someone know if something similar already exists?

3
  • 1
    I highly doubt that Java syntax can be fully expressed in a regex. Commented Oct 3, 2012 at 17:19
  • I feel your approach of building java corrector is wrong , you need not parse the language , you have to tokenize it first and check for its syntax , That will make your life easy Commented Oct 3, 2012 at 17:24
  • Java is not a regular language (although Java regex isn't regular either), so you are in dire straits right there. But I believe there are Java parsers written in Java out there. Commented Oct 3, 2012 at 17:24

3 Answers 3

2

To do so I think the best way is to use Pattern to look for element of java syntax

Incorrect. Regular Expression patterns cannot adequately identify Java syntax elements. That is why the much more complex parsers exist. For a simple example, just imagine how you would you avoid the false match for a reserved word inside a comment, such as following

/* this is not importing anything
import java.util.*;
*/

But if you are very keen to use regular expressions, and willing to spend lot of effort, look at Emacs font-lock-mode, which uses regular expressions to identify and fontify syntax elements.

PS: The "lot of effort" I mention refers to learning how Emacs works, reading elisp code and translating Emacs regexp to Java. if you already know all that then you will need less effort.

Sign up to request clarification or add additional context in comments.

Comments

1

Thank you all for your answers. I think I'm going to work with javaparser AST, it will be a lot easier :)

Here is a code to check for error with AST

import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;

import org.eclipse.jdt.core.compiler.IProblem;
import org.eclipse.jdt.core.dom.AST;
import org.eclipse.jdt.core.dom.ASTParser;
import org.eclipse.jdt.core.dom.CompilationUnit;

public class Main {

    public static void main(String[] args) {

        ASTParser parser = ASTParser.newParser(AST.JLS2);
        FileInputStream in=null;
        try {
            in = new FileInputStream("/root/java/Animbis.java"); //your personal java source file
            int n;
            String text="";
            while( (n=in.read()) !=-1) {
                text+=(char)n;
            }
            CompilationUnit cu;
            // parse the file
            parser.setSource(text.toCharArray());
            in.close();
        }catch (FileNotFoundException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        } catch (IOException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }

        CompilationUnit unit = (CompilationUnit) parser.createAST(null); 
        //unit.recordModifications();
        AST ast = unit.getAST(); 


        IProblem[] problems = unit.getProblems();
        boolean error = false;
        for (IProblem problem : problems) {
           StringBuffer buffer = new StringBuffer();
           buffer.append(problem.getMessage());
           buffer.append(" line: ");
           buffer.append(problem.getSourceLineNumber());
           String msg = buffer.toString(); 
           if(problem.isError()) {
              error = true; 
              msg = "Error:\n" + msg;
           }    
           else 
              if(problem.isWarning())
                 msg = "Warning:\n" + msg;

           System.out.println(msg);  
        }

    }


}

To run with the following jar:

org.eclipse.core.contenttype.jar
org.eclipse.core.jobs.jar
org.eclipse.core.resources.jar
org.eclipse.core.runtime.jar
org.eclipse.equinox.common.jar
org.eclipse.equinox.preferences.jar
org.eclipse.jdt.core.jar
org.eclipse.osgi.jar

Got infos from Eclipse ASTParser and Example of ASTParser

Comments

0

Java's complete syntax cannot be parsed by RegEx. They are different classes of language. Java is at least a Chomsky type 2 language, whereas RegEx is type 3, and type 2 is fundamentally more complex than type 3. See also this famous answer about parsing HTML with RegEx... it's essentially the same problem.

2 Comments

Java is not type 2 since it's definitely not context-free. But then again, modern regex engines (Java's included) are not regular either. And that "famous answer" has become such a tired meme... If you know what you're doing, you can go far with regex: stackoverflow.com/questions/4284176/…
True - I edited "probably" to "at least". And regardless, RegEx is definitely NOT the way to do this.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.