-7

The following regex has been validated on regex101 and works fine, matching either "()", or "[]" or "{}":

\(\)|\[]|\{}

However:

  • it's not so readable

  • in Java gets even less readable:

    \\(\\)|\\[]|\\{}
    

    but still works fine, as my test class shows.

Now I'd like to make it more readable by using Unicode (which should avoid escaping) and constants, defining it like this:

private static final String MATCH_OPENING_BRACE = "\u0028";
private static final String MATCH_CLOSING_BRACE = "\u0029";

private static final String MATCH_OPENING_SQUARE_BRACE = "\u005B";
private static final String MATCH_CLOSING_SQUARE_BRACE = "\u005D";

private static final String MATCH_OPENING_CURLY_BRACE = "\u007B";
private static final String MATCH_CLOSING_CURLY_BRACE = "\u007D";

private static final String MATCHING_OR_FLAG = "|";

private static final String COMPLETE_REGEX = 
    MATCH_OPENING_BRACE + MATCH_CLOSING_BRACE 
    + MATCHING_OR_FLAG + MATCH_OPENING_SQUARE_BRACE + MATCH_CLOSING_SQUARE_BRACE
    + MATCHING_OR_FLAG + MATCH_OPENING_CURLY_BRACE + MATCH_CLOSING_CURLY_BRACE;

private static final String REGEX_REPLACEMENT = ""; 

so that I can write readable code like this:

@Override
public boolean isValid(String input) {

    for (int i = input.length() / 2; i > 0; i--)
        input = input.replaceAll(COMPLETE_REGEX, REGEX_REPLACEMENT);

    return input.isEmpty();
}

instead of using that unreadable literal, like this:

@Override
public boolean isValid(String input) {

    for (int i = input.length() / 2; i > 0; i--)
        input = input.replaceAll("\\(\\)|\\[]|\\{}", "");

    return input.isEmpty();
}

But here the following exception is thrown:

java.util.regex.PatternSyntaxException: Unclosed character class near index 7
    ()|[]|{}
          ^

I tried adding an escape char, like this:

private static final String MATCH_OPENING_CURLY_BRACE = "\\\u007B";

but that only gives a similar exception:

java.util.regex.PatternSyntaxException: Unclosed character class near index 8
    ()|[]|\{}
           ^

Any hints?

11
  • 3
    imo the 'unreadability' (and the errors btw) comes from the fact that you're using metacharacters in the pattern, which therefore require quoting. I don't think it's going to help by either using constants and/or Unicode expressions. You could make your intentions clear and solve quoting problems with something like Pattern p = compile(quote("{}") + "|" + quote("[]") + "|" + quote("()")); You need to import static java.util.regex.Pattern.*; for that Commented Jan 19 at 11:22
  • 3
    BTW writing "\u0028" is exactly the same as "(" - the first is just harder to read -- One of the first steps done at compilation is to translate \uxxxx to the corresponding unicode code point (JLS 3.2. Lexical Translations) Commented Jan 19 at 11:28
  • 4
    "Now I'd like to make it more readable by using Unicode (which should avoid escaping)" avoid escaping the string literal. You still need to escape the pattern. If yo pass { as either unicode or string literal to the regex pattern it's still going to be interpreted as a meta character. Unicode doesn't magically make regex accept it as a non-meta character. Same for all other brackets. You only get an actual error on { but all other brackets will be treated as meta characters, so your regex wouldn't work as expected anyway. Commented Jan 19 at 11:33
  • 3
    ccampisano - I never wrote escapes are not needed; you wrote "make it more readable by using Unicode", but: 1) as I alreday wrote, "\u0028" is not more readable than "(" (actually my opinion); 2) it is the same as "(" , so it does not change anything regarding escapes -- again (one of) the first step of compilation (basically part of reading the source, is to convert unicode sequences (even in a string literal) to code points - so there is no difference for the compiler (e.g. you cannot write "\u000a" since a newline is NOT valid in a single string) Commented Jan 19 at 11:34
  • 1
    input = input.replaceAll("\(\)|\[]|\\{}", ""); As someone said, they are all metacharacters so all need escaping Commented Jan 19 at 11:38

2 Answers 2

3

Maybe using comments and spaces to explain and format the expression:

String regex = """
    (?x)     # allows comments and ignore whitespace
      \\(\\) # ()  escaped
    |        # or
      \\[]   # []  escaped
    |        # or
      \\{}   # {}  escaped
    """;

the formatting can be changed to your liking
drawback: relevant # and spaces must also be escaped


For longer sequences Pattern#quote can be used. Probably not so useful for small sequences (like ())

Sign up to request clarification or add additional context in comments.

1 Comment

(I do not say that I do unconditionally like that - it is like commenting code - must I comment i++?!)
-1

as @user85421 mentioned, using unicodes won't make the escapes unrequired, as I though it would.

so, escaping (, ), [ and { is still required, here's a fix:

    private static final String MATCH_OPENING_BRACE = "\\(";
    private static final String MATCH_CLOSING_BRACE = "\\)";
    
    private static final String MATCH_OPENING_SQUARE_BRACE = "\\[";
    private static final String MATCH_CLOSING_SQUARE_BRACE = "]";
    
    private static final String MATCH_OPENING_CURLY_BRACE = "\\{";
    private static final String MATCH_CLOSING_CURLY_BRACE = "}";

the above works fine, just like the following, with all the meta-characters escaped (see @g00se comment below):

    private static final String MATCH_OPENING_BRACE = "\\(";
    private static final String MATCH_CLOSING_BRACE = "\\)";
    
    private static final String MATCH_OPENING_SQUARE_BRACE = "\\[";
    private static final String MATCH_CLOSING_SQUARE_BRACE = "\\]";
    
    private static final String MATCH_OPENING_CURLY_BRACE = "\\{";
    private static final String MATCH_CLOSING_CURLY_BRACE = "\\}";

indeed, the original regex was NOT escaping ALL the meta-characters, as you can see, and it still works fine:

    input = input.replaceAll("\\(\\)|\\[]|\\{}", "");

indeed, all the tests runs fine also with all meta-characters escaped:

input = input.replaceAll("\\(\\)|\\[\\]|\\{\\}", "");

2 Comments

At this point, what do you even gain from using this code? "\\(" is a lot more readable to me than private static final String MATCH_OPENING_BRACE = "\\\u0028";. Also, I don't even know if that's the correct character code. Probably, but I'd never be able to tell by glance. I can tell by glance what "\\(" is. And I can understand "\\(\\)|\\[]|\\{}" without needing to do multiple hops in the source code then go look up what the character codes mean.
Your latest edit is still wrong (see my latest comment above)

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.