Adding whitespace handling to existing Java regex

Question

A long time ago I wrote a method called detectBadChars(String) that inspects the String argument for instances of so-called "bad" characters.

The original list of bad characters was:

'~'
'#'
'@'
'*'
'+'
'%'

My method, which works great, is:

// Detects for the existence of bad chars in a string and returns the
// bad chars that were found.
protected String detectBadChars(String text) {
    Pattern pattern = Pattern.compile("[~#@*+%]");
    Matcher matcher = pattern.matcher(text);

    StringBuilder violatorsBuilder = new StringBuilder();

    if(matcher.find()) {
        String group = matcher.group();
        if (!violatorsBuilder.toString().contains(group))
            violatorsBuilder.append(group);
    }

    return violatorsBuilder.toString();
}

The business logic has now changed, and the following are now also considered to be bad:

Carriage returns (\r)
New lines (\n)
Tabs (\t)
Any consecutive whitespaces (" ", " ", etc.)

So I am trying to modify the regex to accomodate the new bad characters. Changing the regex to:

    Pattern pattern = Pattern.compile("[~#@*+%\n\t\r[ ]+]");

...throws exceptions. My thinking was that adding "\n\t\r" to the regex would allot for newlines, tabs and CRs respectively. And then adding "[ ]+" adds a new "class/group" consisting of whitespaces, and then quantitfies that group as allowing 1+ of those whitespaces, effectively taking care of consecutive whitespaces.

Where am I going awyre and what should my regex be (and why)? Thanks in advance!

Rohit Jain · Accepted Answer · 2013-08-27 18:17:09Z

6

Just using \\s will account for all of them. And add the + quantifier on entire character class, to match 1 or more repetition:

Pattern.compile("[~#@*+%\\s]+");

Note that in Java, you need to escape the backslashes. So it's \\s and not \s.

answered Aug 27, 2013 at 18:17

Rohit Jain

214k45 gold badges419 silver badges534 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Fritz Over a year ago

Argh! for 2 seconds! (+1)

MByD Over a year ago

+1 also note that the exception is thrown because you need to escape the backslashes themselves.

Alan Moore Over a year ago

You're just forbidding all whitespace. The way I read it, \r, \n` and \t are always forbidden, but a simple space character is okay--it's just two or more consecutive spaces that aren't allowed.

Rohit Jain Over a year ago

@AlanMoore. I see that OP was doing [ ]+ to match 1 or more whitespace. So, may be he meant that only, just that he didn't put it up in words correctly.

CodeHelp · Accepted Answer · 2013-08-29 01:06:51Z

-1

I think this should work.

Pattern.compile("[~#@*+%\n\t\r\\s{2,}]");

You need \\s{2,} to match any consecutive whitespaces.

Edit: I did a mistake above. Thanks to Alan Moore for pointing it out. Here is the new solution.

Pattern.compile("[~#@*+%\n\t\r]|\\s{2,}")

edited Aug 29, 2013 at 1:06

answered Aug 28, 2013 at 9:32

CodeHelp

1,3285 gold badges21 silver badges37 bronze badges

1 Comment

Alan Moore Over a year ago

Inside a character class, {2,} is not a quantifier meaning two or more, it's just a list of literal characters: {, 2, ,, or }.

Collectives™ on Stack Overflow

Adding whitespace handling to existing Java regex

2 Answers 2

4 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related