1

I am trying to replace all the characters between an html font tag with an expression. I wrote a little test program but it is not working correcty. Here is my regular expression:

test.replaceAll("<font\b(.*)>", "Something");

This does not work.

Why?

3
  • 1
    What isn't working? Is it not replacing anything? Replacing too much? Commented Apr 14, 2011 at 21:10
  • can you provide an example of source text and desired output? Commented Apr 14, 2011 at 21:11
  • You can also play around with experimental regexes at refiddle.com which might help nail down the issue. Commented Apr 14, 2011 at 21:11

2 Answers 2

5

Note that the * operator is greedy, i.e.,

String test = "<font size=\"10\"><b>hello</b></font>";
System.out.println(test.replaceAll("<font\\b(.*)>", "Something"));

prints

Something

You may want to use [^>]*

test.replaceAll("<font\\b([^>]*)>", "Something")

or a reluctant quantifier, *?:

test.replaceAll("<font\\b(.*?)>", "Something")

which both result in

Something<b>hello</b></font>
Sign up to request clarification or add additional context in comments.

1 Comment

Yep. This is exactly what I wanted.
2

You probably want two "\" before the "b":

test.replaceAll("<font\\b(.*)>", "Something");

You need this because the regular expression is a string and backslashes need to be escaped in strings.

To make it only match up to the first ">", do this:

test.replaceAll("<font\\b(.*?)>", "Something");

This makes the * "lazy", so that it matches as little as possible rather than as much as possible.

However, it seems that it is better to write this particular expression as follows:

test.replaceAll("<font\\b([^>]*)>", "Something");

This has the same effect and avoids backtracking, which should improve performance.

6 Comments

Yay for the quirkyness of Java and requiring you to escape slashes in a regex. For reference, in Java \\b would be the word boundary match, \\\\b would match a literal \b.
My favorite example is matching a backslash: you need "\\\\" for one.
@TikhonJelvis: Whoops, I was editing my comment to clarify why this works when you posted your last comment.
Don't use the greedy dot-star! use [^>]*
@Tikhon: Yes, that’s a Java regex library flaw. Many of us who care about such things as simplicity of regexes long ago wrote a super-simple front-end to the Pattern class that lets you use normal slashes instead of backslashes, thus allowing things like /w+ for a 7-bit ASCII word, /s+ for 7-bit whitespace — and of course, simply / for a literal backslash. It’s a touch weird at first, but infinitely preferable to the error-prone aggravation of How many backslashes does it take to get to the center of a tootsie-roll pop? Too many hours lost debugging ever to go back to raw Java regexes.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.