0

I have this piece of code where I insert a Pattern key and a String token to a hashmap:

while( (word = reservedWordsRead.readLine()) != null ) {
    String[] k = word.split(" ");
    infoList.put(Pattern.compile("^("+k[0]+")"), //lexeme
                        k[1]); //token
}

It reads from a file that goes like this:

) rparen
( lparen

and but the parentheses aren't recognized so I modified the file to look like this:

\\) rparen
\\( lparen

and the code like this:

while( (word = reservedWordsRead.readLine()) != null ) {
    String[] k = word.split(" ");
    infoList.put(Pattern.compile("^("+Pattern.quote(k[0])+")"), //lexeme
                        k[1]); //token
}

But I don't get the proper output. It doesn't match anything. Also, the rparen and lparen are inserted in the hashmap because I am able to print the following using my tokenizer() method:

pattern: ^(\Q\\)\E), token: rparen
pattern: ^(\Q\\(\E), token: lparen

This is my tokenizer method:

public void tokenize(String str) {
    String s = str.trim();
    tokenList.clear();

    while (!s.equals("")) {
        boolean match = false;
        for ( Entry<Pattern,String> thing: infoList.entrySet() ) {
            System.out.println("pattern: "+thing.getKey().toString()+", token: "+thing.getValue());
            Matcher m = thing.getKey().matcher(s);
            if (m.find()) {
                match = true;
                String tok = m.group().trim();
                s = m.replaceFirst("").trim();
                tokenList.put(tok,thing.getValue());
                break;
            }
        } if (!match) 
            throw new ParserException("Unexpected character in input: "+s);
    }
}

I'm not sure what I'm doing wrong.. Gladly appreciate your help :)

0

2 Answers 2

1

You should use Pattern.quote() if you want to match exact strings.

The problem you ran into is you're trying to both quote the passed string and escape the parenthesis, essentially a double-escape (reminiscent of &amp;amp; in HTML). While you could put all the special escape characters in your input file, why bother? Let Pattern do the work for you.

Here's a test, where we try several different inputs and try to turn them into a Pattern, like you do.

import java.util.regex.Pattern;
import java.util.regex.PatternSyntaxException;

public class RegexTest
{
    private static final String[] TESTS = {"a","(","\\(","\\\\(","[letters]"};

    public static void main(String[] args) {
        for(String test : TESTS) {
            examineRegex(test);
            System.out.println();
        }
    }

    public static void examineRegex(String match) {
        System.out.println("Testing "+match);
        String template = "^(%s)";
        String regex = String.format(template, match);
        examinePattern(match, regex);
        String quotedRegex = String.format(template, Pattern.quote(match));
        examinePattern(match, quotedRegex);
    }

    public static void examinePattern(String match, String regex) {
        try {
            Pattern pattern = Pattern.compile(regex);
            System.out.println("  Compiled:  "+pattern);
            System.out.println("  Match?:    "+pattern.matcher(match).matches());
        } catch (PatternSyntaxException e) {
            System.out.println("  Failed to compile: "+e.getMessage()
                .substring(0, e.getMessage().indexOf('\n')));
        }
    }
}

The output of this program is below (comments inline):

Testing a
  Compiled:  ^(a)
  Match?:    true
  Compiled:  ^(\Qa\E)
  Match?:    true

For the simple case of a "normal" string, both your original method and using Pattern.quote() works. So far so good.

Testing (
  Failed to compile: Unclosed group near index 4
  Compiled:  ^(\Q(\E)
  Match?:    true

But if we pass in a construct, such as (, we get an error, unless we quote it.

Testing \(
  Compiled:  ^(\()
  Match?:    false
  Compiled:  ^(\Q\(\E)
  Match?:    true

If we pass in an escaped construct, the raw pattern successfully compiles, but it doesn't match the input string. That's not the end of the world - it would match ( - but it's counter-intuitive; it ruins the expectation that what's passed in is what we match.

Testing \\(
  Failed to compile: Unclosed group near index 6
  Compiled:  ^(\Q\\(\E)
  Match?:    true

Now we doubly-escape a pattern, as if trying to treat the input as a Java string. This demonstrates the potential for confusion when trying to identify exactly how much needs to be escaped.

Testing [letters]
  Compiled:  ^([letters])
  Match?:    false
  Compiled:  ^(\Q[letters]\E)
  Match?:    true

Finally, suppose we wanted to match a string that is also an actual regular expression? It will compile successfully, therefore failing to alert us to the problem, but will fail to match the expected string.

As you can see, Pattern.quote() worked every time, and avoids needing to put the implementation details of the regular expression into your data file. This way, you hide the implementation detail of how the match actually occurs from the text file, this sort of compartmentalization leads to robust code.

Of course, if what you want in the file is a list of regular expressions, you obviously don't want to use Pattern.quote(), and you instead need to make it a clear expectation of the user that the inputs need to be valid Java regular expressions, and that potentially confusing results can come from providing poor patterns.

Sign up to request clarification or add additional context in comments.

1 Comment

wow this is very useful. I also didn't really want a list of regular expressions in my file. thanks..
0

\) rparen in the file as in a java String "...\\)..." the backslash has to be doubled to represent the backslash. Then quote is not needed, it also does it a bit more circumstantial.

2 Comments

Could you explain to this mystified reader what you mean by "it also does it a bit more circumstantial"?
You can do yourself a System.out.println(Pattern.quote("(")); which if I got it right gives \Q(\E.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.