2
public class test {
        public static void main(String[]args) {
            String test1 = "Nørrebro, Denmark";
            String test2 = "ø";
            String regex = new String("^&\\S*;$");
            String value = test1.replaceAll(regex,"");
            System.out.println(test2.matches(regex));
            System.out.println(value);
        }
    }

This gives me following Output:

true
Nørrebro, Denmark

How is that possible ? Why does replaceAll() not register a match?

6
  • Not clear what the problem is. Your value comes from replacing test1, while your matches is testing test2. Commented Feb 27, 2018 at 15:53
  • Yes, but test2 is a substring of test1. test2 also matches with the regex. From my understanding replaceAll() looks for substrings that match a regex and replaces them with a given replacement. Commented Feb 27, 2018 at 15:55
  • 1
    Only test2 matches with regex. test1 does not. Commented Feb 27, 2018 at 15:55
  • Your test2 matches the regex, but only if it's a whole string and not a substring. Check what $ means at the end of a regex... Commented Feb 27, 2018 at 15:57
  • What is the difference between a substring and a whole string ? Commented Feb 27, 2018 at 15:58

4 Answers 4

2

Your regex includes ^. Which makes the regex match from the very start.

If you try

test1.matches(regex)

you will get false.

Sign up to request clarification or add additional context in comments.

2 Comments

I need the "&" character to be at the beginning. Shouldnt it match a substring that starts with "&" ?
@StefanWatt In your test1 string, & is not at the beginning, it is after N.
2

You need to understand what ^ and $ means.

You probably put them in there because you want to say:

At the start of each match, I want a &, then 0 or more non-whitespace characters, then a ; at the end of the match.

However, ^ and $ doesn't mean the start and end of each match. It means the start and end of the string.

So you should remove the ^ and $ from your regex:

String regex = "&\\S*;";

Now it outputs:

true
Nrrebro, Denmark

"What character specifies the start and end of the match then?" you might ask. Well, since your regex basically the pattern you are matching, the start of the regex is the start of the match (unless you have lookbehinds)!

Comments

1

It is possible because ^&\S*;$ pattern matches the entire ø string but it does not match entire Nørrebro, Denmark string. The ^ matches (requires here) start of string to be right before & and $ requires the ; to appear right at the end of the string.

Just removing the ^ and $ anchors may not work, because \S* is a greedy pattern, and it may overmatch, e.g. in Nørrebro;.

You may use &\w+; or &\S+?; pattern, e.g.:

String test1 = "Nørrebro, Denmark";
String regex = "&\\w+;";
String value = test1.replaceAll(regex,"");
System.out.println(value); // => Nrrebro, Denmark

See the Java demo.

The &\w+; pattern matches a &, then any 1+ word chars, and then ;, anywhere inside the string. \S*? matches any 0+ chars other than whitespace.

Comments

1

You can use this regex : &(.*?);

        String test1 = "Nørrebro, Denmark";
        String test2 = "ø";
        String regex = new String("&(.*?);");
        String value = test1.replaceAll(regex,"");
        System.out.println(test2.matches(regex));
        System.out.println(value);

output :

true 
Nrrebro, Denmark

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.