3

I have the following code:

public static void main(String[] args){
    StringBuilder content = new StringBuilder("abcd efg h i. -  – jk(lmn) qq zz.");
    String patternSource = "[.-–]($| )";
    Pattern pattern = Pattern.compile(patternSource);
    Matcher matcher = pattern.matcher(content);
    System.out.println(matcher.replaceAll(""));
}

where patternSource character class consist of dot, minus sign and \u2013 character (something like long dash). Upon execution in gives me

abcefi-  jk(lmn) qzz

If I change the order of symbols in my character class in any way, it begans to work normally, and gives

abcd efg h i jk(lmn) qq zz

What the hell?

Tested under JDK/JRE 1.6.0_23

1 Answer 1

4

If you have an unescaped hyphen in a character class it has a special meaning as a range of characters: e.g. [A-Z] means all the characters between A and Z.

An exception to this is when the hyphen is at the start or end of the character class, in which case it is treated literally and matches only a hyphen.

Sign up to request clarification or add additional context in comments.

2 Comments

Mark this answer as answered by clicking on the V to the left.
I know, but at that moment time limit didn't elapsed yet.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.