1

For some reason the while loop is only going through one time, picking up a NUMBER and then exiting. Does anyone have any idea why it isn't lexing the rest of the String? All I had was an input of 1 + 2. Any help is much appreciated!!

public Lexer(String input) throws TokenMismatchException {
        tokens = new ArrayList<Token>();

        // Lexing logic begins here
        StringBuffer tokenPatternsBuffer = new StringBuffer();
        for (Type type : Type.values())
            tokenPatternsBuffer.append(String.format("|(?<%s>%s)", type.name(), type.pattern));
        Pattern tokenPatterns = Pattern.compile(new String(tokenPatternsBuffer.substring(1)));

        // Begin matching tokens
        Matcher matcher = tokenPatterns.matcher(input.replaceAll(" ", ""));
        while (matcher.find()) {
            if (matcher.group(Type.NUMBER.name()) != null) {
                tokens.add(new Token(Type.NUMBER, matcher.group(Type.NUMBER.name())));
                continue;
            } else if (matcher.group(Type.OPERATOR.name()) != null) {
                tokens.add(new Token(Type.OPERATOR, matcher.group(Type.OPERATOR.name())));
                continue;
            } else if (matcher.group(Type.UNIT.name()) != null) {
                tokens.add(new Token(Type.UNIT, matcher.group(Type.UNIT.name())));
                continue;
            } else if (matcher.group(Type.PARENTHESES.name()) != null) {
                tokens.add(new Token(Type.PARENTHESES, matcher.group(Type.PARENTHESES.name())));
                continue;
            } else {
                throw new TokenMismatchException();
            }
        }
    }

enum Type {
    NUMBER("[0-9]+.*[0-9]*"), OPERATOR("[*|/|+|-]"), UNIT("[in|pt]"), PARENTHESES("[(|)]");

    public final String pattern;

    private Type(String pattern) {
        this.pattern = pattern;
    }
}
0

1 Answer 1

1

This pattern:

"[0-9]+.*[0-9]*"

matches one or more digits, followed by zero or more of any character, followed by zero or more digits. The dot is a special character in regexes that means "any character". If you're trying to match a decimal point, you need to put a backslash before the dot:

"[0-9]+\\.*[0-9]*"

(The backslash is doubled because it's in a Java string literal.) It appears to work on "1 + 2" if that one fix is made. However, some of your other patterns show some misunderstanding of what [] does in a regex. This is a "character class" that matches any of the characters you list in between the brackets, except that - can be used for a range of characters (like 0-9). So

"[*|/|+|-]"

matches any of the characters *, |, /, +, - (the | does not mean "or" inside square brackets). - isn't treated as a range operator here since it's last, but it's probably best to get in the habit of using \ in front of it anyway, so you want

"[*/+\\-]"

Similarly,

"[in|pt]"

matches one of the five characters i, n, |, p, t--certainly not what you want. You probably want

"(in|pt)"

which matches either "in" or "pt"; the parentheses may not be necessary in your case, but in a different case, they may be necessary to prevent some other characters from being included in one of the alternatives when the pattern is included in a larger string.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.