Problems making a regex more readable than its literal [closed]

Question

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.

This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.

Closed 10 months ago.

Improve this question

The following regex has been validated on regex101 and works fine, matching either "()", or "[]" or "{}":

\(\)|\[]|\{}

However:

it's not so readable
in Java gets even less readable:
```
\\(\\)|\\[]|\\{}
```
but still works fine, as my test class shows.

Now I'd like to make it more readable by using Unicode (which should avoid escaping) and constants, defining it like this:

private static final String MATCH_OPENING_BRACE = "\u0028";
private static final String MATCH_CLOSING_BRACE = "\u0029";

private static final String MATCH_OPENING_SQUARE_BRACE = "\u005B";
private static final String MATCH_CLOSING_SQUARE_BRACE = "\u005D";

private static final String MATCH_OPENING_CURLY_BRACE = "\u007B";
private static final String MATCH_CLOSING_CURLY_BRACE = "\u007D";

private static final String MATCHING_OR_FLAG = "|";

private static final String COMPLETE_REGEX = 
    MATCH_OPENING_BRACE + MATCH_CLOSING_BRACE 
    + MATCHING_OR_FLAG + MATCH_OPENING_SQUARE_BRACE + MATCH_CLOSING_SQUARE_BRACE
    + MATCHING_OR_FLAG + MATCH_OPENING_CURLY_BRACE + MATCH_CLOSING_CURLY_BRACE;

private static final String REGEX_REPLACEMENT = "";

so that I can write readable code like this:

@Override
public boolean isValid(String input) {

    for (int i = input.length() / 2; i > 0; i--)
        input = input.replaceAll(COMPLETE_REGEX, REGEX_REPLACEMENT);

    return input.isEmpty();
}

instead of using that unreadable literal, like this:

@Override
public boolean isValid(String input) {

    for (int i = input.length() / 2; i > 0; i--)
        input = input.replaceAll("\\(\\)|\\[]|\\{}", "");

    return input.isEmpty();
}

But here the following exception is thrown:

java.util.regex.PatternSyntaxException: Unclosed character class near index 7
    ()|[]|{}
          ^

I tried adding an escape char, like this:

private static final String MATCH_OPENING_CURLY_BRACE = "\\\u007B";

but that only gives a similar exception:

java.util.regex.PatternSyntaxException: Unclosed character class near index 8
    ()|[]|\{}
           ^

Any hints?

imo the 'unreadability' (and the errors btw) comes from the fact that you're using metacharacters in the pattern, which therefore require quoting. I don't think it's going to help by either using constants and/or Unicode expressions. You could make your intentions clear and solve quoting problems with something like Pattern p = compile(quote("{}") + "|" + quote("[]") + "|" + quote("()")); You need to import static java.util.regex.Pattern.*; for that — g00se
– g00se, Commented Jan 19 at 11:22
BTW writing "\u0028" is exactly the same as "(" - the first is just harder to read -- One of the first steps done at compilation is to translate \uxxxx to the corresponding unicode code point (JLS 3.2. Lexical Translations) — user85421
– user85421, Commented Jan 19 at 11:28
"Now I'd like to make it more readable by using Unicode (which should avoid escaping)" avoid escaping the string literal. You still need to escape the pattern. If yo pass { as either unicode or string literal to the regex pattern it's still going to be interpreted as a meta character. Unicode doesn't magically make regex accept it as a non-meta character. Same for all other brackets. You only get an actual error on { but all other brackets will be treated as meta characters, so your regex wouldn't work as expected anyway. — VLAZ
– VLAZ, Commented Jan 19 at 11:33
ccampisano - I never wrote escapes are not needed; you wrote "make it more readable by using Unicode", but: 1) as I alreday wrote, "\u0028" is not more readable than "(" (actually my opinion); 2) it is the same as "(" , so it does not change anything regarding escapes -- again (one of) the first step of compilation (basically part of reading the source, is to convert unicode sequences (even in a string literal) to code points - so there is no difference for the compiler (e.g. you cannot write "\u000a" since a newline is NOT valid in a single string) — user85421
– user85421, Commented Jan 19 at 11:34
input = input.replaceAll("\(\)|\[]|\\{}", ""); As someone said, they are all metacharacters so all need escaping — g00se
– g00se, Commented Jan 19 at 11:38

2 revs · Accepted Answer · 2025-01-19 11:58:35Z

3

Maybe using comments and spaces to explain and format the expression:

String regex = """
    (?x)     # allows comments and ignore whitespace
      \\(\\) # ()  escaped
    |        # or
      \\[]   # []  escaped
    |        # or
      \\{}   # {}  escaped
    """;

_{the formatting can be changed to your liking}
^{drawback: relevant # and spaces must also be escaped}

For longer sequences Pattern#quote can be used. Probably not so useful for small sequences (like ())

edited Jan 19 at 11:58

community wiki

2 revs
user85421

Sign up to request clarification or add additional context in comments.

1 Comment

user85421 Jan 19 at 12:09

(I do not say that I do unconditionally like that - it is like commenting code - must I comment i++?!)

ccampisano · Accepted Answer · 2025-01-19 12:57:07Z

-1

as @user85421 mentioned, using unicodes won't make the escapes unrequired, as I though it would.

so, escaping (, ), [ and { is still required, here's a fix:

    private static final String MATCH_OPENING_BRACE = "\\(";
    private static final String MATCH_CLOSING_BRACE = "\\)";
    
    private static final String MATCH_OPENING_SQUARE_BRACE = "\\[";
    private static final String MATCH_CLOSING_SQUARE_BRACE = "]";
    
    private static final String MATCH_OPENING_CURLY_BRACE = "\\{";
    private static final String MATCH_CLOSING_CURLY_BRACE = "}";

the above works fine, just like the following, with all the meta-characters escaped (see @g00se comment below):

    private static final String MATCH_OPENING_BRACE = "\\(";
    private static final String MATCH_CLOSING_BRACE = "\\)";
    
    private static final String MATCH_OPENING_SQUARE_BRACE = "\\[";
    private static final String MATCH_CLOSING_SQUARE_BRACE = "\\]";
    
    private static final String MATCH_OPENING_CURLY_BRACE = "\\{";
    private static final String MATCH_CLOSING_CURLY_BRACE = "\\}";

indeed, the original regex was NOT escaping ALL the meta-characters, as you can see, and it still works fine:

    input = input.replaceAll("\\(\\)|\\[]|\\{}", "");

indeed, all the tests runs fine also with all meta-characters escaped:

input = input.replaceAll("\\(\\)|\\[\\]|\\{\\}", "");

edited Jan 19 at 12:57

answered Jan 19 at 11:32

ccampisano

318 bronze badges

2 Comments

VLAZ Jan 19 at 11:36

At this point, what do you even gain from using this code? "\\(" is a lot more readable to me than private static final String MATCH_OPENING_BRACE = "\\\u0028";. Also, I don't even know if that's the correct character code. Probably, but I'd never be able to tell by glance. I can tell by glance what "\\(" is. And I can understand "\\(\\)|\\[]|\\{}" without needing to do multiple hops in the source code then go look up what the character codes mean.

g00se Jan 19 at 11:41

Your latest edit is still wrong (see my latest comment above)

Collectives™ on Stack Overflow

Problems making a regex more readable than its literal [closed]

2 Answers 2

1 Comment

2 Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

2 Comments

Related