A long time ago I wrote a method called detectBadChars(String) that inspects the String argument for instances of so-called "bad" characters.
The original list of bad characters was:
- '~'
- '#'
- '@'
- '*'
- '+'
- '%'
My method, which works great, is:
// Detects for the existence of bad chars in a string and returns the
// bad chars that were found.
protected String detectBadChars(String text) {
Pattern pattern = Pattern.compile("[~#@*+%]");
Matcher matcher = pattern.matcher(text);
StringBuilder violatorsBuilder = new StringBuilder();
if(matcher.find()) {
String group = matcher.group();
if (!violatorsBuilder.toString().contains(group))
violatorsBuilder.append(group);
}
return violatorsBuilder.toString();
}
The business logic has now changed, and the following are now also considered to be bad:
- Carriage returns (
\r) - New lines (
\n) - Tabs (
\t) - Any consecutive whitespaces (
" "," ", etc.)
So I am trying to modify the regex to accomodate the new bad characters. Changing the regex to:
Pattern pattern = Pattern.compile("[~#@*+%\n\t\r[ ]+]");
...throws exceptions. My thinking was that adding "\n\t\r" to the regex would allot for newlines, tabs and CRs respectively. And then adding "[ ]+" adds a new "class/group" consisting of whitespaces, and then quantitfies that group as allowing 1+ of those whitespaces, effectively taking care of consecutive whitespaces.
Where am I going awyre and what should my regex be (and why)? Thanks in advance!