The following regex has been validated on regex101 and works fine, matching either "()", or "[]" or "{}":
\(\)|\[]|\{}
However:
it's not so readable
in Java gets even less readable:
\\(\\)|\\[]|\\{}but still works fine, as my test class shows.
Now I'd like to make it more readable by using Unicode (which should avoid escaping) and constants, defining it like this:
private static final String MATCH_OPENING_BRACE = "\u0028";
private static final String MATCH_CLOSING_BRACE = "\u0029";
private static final String MATCH_OPENING_SQUARE_BRACE = "\u005B";
private static final String MATCH_CLOSING_SQUARE_BRACE = "\u005D";
private static final String MATCH_OPENING_CURLY_BRACE = "\u007B";
private static final String MATCH_CLOSING_CURLY_BRACE = "\u007D";
private static final String MATCHING_OR_FLAG = "|";
private static final String COMPLETE_REGEX =
MATCH_OPENING_BRACE + MATCH_CLOSING_BRACE
+ MATCHING_OR_FLAG + MATCH_OPENING_SQUARE_BRACE + MATCH_CLOSING_SQUARE_BRACE
+ MATCHING_OR_FLAG + MATCH_OPENING_CURLY_BRACE + MATCH_CLOSING_CURLY_BRACE;
private static final String REGEX_REPLACEMENT = "";
so that I can write readable code like this:
@Override
public boolean isValid(String input) {
for (int i = input.length() / 2; i > 0; i--)
input = input.replaceAll(COMPLETE_REGEX, REGEX_REPLACEMENT);
return input.isEmpty();
}
instead of using that unreadable literal, like this:
@Override
public boolean isValid(String input) {
for (int i = input.length() / 2; i > 0; i--)
input = input.replaceAll("\\(\\)|\\[]|\\{}", "");
return input.isEmpty();
}
But here the following exception is thrown:
java.util.regex.PatternSyntaxException: Unclosed character class near index 7
()|[]|{}
^
I tried adding an escape char, like this:
private static final String MATCH_OPENING_CURLY_BRACE = "\\\u007B";
but that only gives a similar exception:
java.util.regex.PatternSyntaxException: Unclosed character class near index 8
()|[]|\{}
^
Any hints?
Pattern p = compile(quote("{}") + "|" + quote("[]") + "|" + quote("()"));You need toimport static java.util.regex.Pattern.*;for that"\u0028"is exactly the same as"("- the first is just harder to read -- One of the first steps done at compilation is to translate\uxxxxto the corresponding unicode code point (JLS 3.2. Lexical Translations){as either unicode or string literal to the regex pattern it's still going to be interpreted as a meta character. Unicode doesn't magically make regex accept it as a non-meta character. Same for all other brackets. You only get an actual error on{but all other brackets will be treated as meta characters, so your regex wouldn't work as expected anyway."\u0028"is not more readable than"("(actually my opinion); 2) it is the same as"(", so it does not change anything regarding escapes -- again (one of) the first step of compilation (basically part of reading the source, is to convert unicode sequences (even in a string literal) to code points - so there is no difference for the compiler (e.g. you cannot write"\u000a"since a newline is NOT valid in a single string)