2

A long time ago I wrote a method called detectBadChars(String) that inspects the String argument for instances of so-called "bad" characters.

The original list of bad characters was:

  • '~'
  • '#'
  • '@'
  • '*'
  • '+'
  • '%'

My method, which works great, is:

// Detects for the existence of bad chars in a string and returns the
// bad chars that were found.
protected String detectBadChars(String text) {
    Pattern pattern = Pattern.compile("[~#@*+%]");
    Matcher matcher = pattern.matcher(text);

    StringBuilder violatorsBuilder = new StringBuilder();

    if(matcher.find()) {
        String group = matcher.group();
        if (!violatorsBuilder.toString().contains(group))
            violatorsBuilder.append(group);
    }

    return violatorsBuilder.toString();
}

The business logic has now changed, and the following are now also considered to be bad:

  • Carriage returns (\r)
  • New lines (\n)
  • Tabs (\t)
  • Any consecutive whitespaces (" ", " ", etc.)

So I am trying to modify the regex to accomodate the new bad characters. Changing the regex to:

    Pattern pattern = Pattern.compile("[~#@*+%\n\t\r[ ]+]");

...throws exceptions. My thinking was that adding "\n\t\r" to the regex would allot for newlines, tabs and CRs respectively. And then adding "[ ]+" adds a new "class/group" consisting of whitespaces, and then quantitfies that group as allowing 1+ of those whitespaces, effectively taking care of consecutive whitespaces.

Where am I going awyre and what should my regex be (and why)? Thanks in advance!

2 Answers 2

6

Just using \\s will account for all of them. And add the + quantifier on entire character class, to match 1 or more repetition:

Pattern.compile("[~#@*+%\\s]+");

Note that in Java, you need to escape the backslashes. So it's \\s and not \s.

Sign up to request clarification or add additional context in comments.

4 Comments

Argh! for 2 seconds! (+1)
+1 also note that the exception is thrown because you need to escape the backslashes themselves.
You're just forbidding all whitespace. The way I read it, \r, \n` and \t are always forbidden, but a simple space character is okay--it's just two or more consecutive spaces that aren't allowed.
@AlanMoore. I see that OP was doing [ ]+ to match 1 or more whitespace. So, may be he meant that only, just that he didn't put it up in words correctly.
-1

I think this should work.

Pattern.compile("[~#@*+%\n\t\r\\s{2,}]");

You need \\s{2,} to match any consecutive whitespaces.

Edit: I did a mistake above. Thanks to Alan Moore for pointing it out. Here is the new solution.

Pattern.compile("[~#@*+%\n\t\r]|\\s{2,}")

1 Comment

Inside a character class, {2,} is not a quantifier meaning two or more, it's just a list of literal characters: {, 2, ,, or }.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.