1

I need to write regex that replaces a with b but only inside <pre> tag.

Example

a <pre> c a <foo> a d </pre> a

Result

a <pre> c b <foo> b d </pre> a

Please help writing expression for java String.replace function. There is a guarantee that pre tag is not nested.

8
  • 1
    Cue the torrent of "Don't use regular expressions to parse HTML"... Commented Feb 8, 2013 at 8:47
  • @T.J.Crowder It isn't html actually, it's my simple markup Commented Feb 8, 2013 at 8:50
  • @RuiJarimba trying to write something like <pre>(.*?)a(.*?)</pre> -> $1b$2 but it isn't working Commented Feb 8, 2013 at 8:52
  • @Poma please clarify, do you need to change only 'a' characters in <pre> or you need to change whole content in <pre>? Commented Feb 8, 2013 at 8:54
  • 1
    @T.J.Crowder ok I can replace </?pre> with ` char so it doesn't look like html anymore. Commented Feb 8, 2013 at 9:23

3 Answers 3

3

I think the best you can do with String.replace() is something like:

String string = ...
for (;;)
{
    String original = string;
    string = string.replaceFirst("(<pre>.*?)a(.*?</pre>)", "$1b$2");
    if (original.equals(string))
        break;
}

(EDIT: @Bohemian has noted the above regex doesn't work correctly. So it needs to be changed to:
(<pre>(?:(?!</pre>).)*a((?:(?!<pre>).)*</pre>) (untested) to avoid matching outside a <pre>...</pre> section. With this change, we don't need the *? quantifier and can use the more common "greedy" (*) quantifier. This is starting to look a lot like my other answer, which I only really meant as a joke!)

You're better off using a Matcher (following code off the top of my head):

import java.util.regex.Pattern;
import java.util.regex.Matcher;

Pattern pattern = Pattern.compile("(?<=<pre>)(.*?)(?=</pre>)");
Matcher m = pattern.matcher(string);
StringBuffer replacement = new StringBuffer();

while (matcher.find())
{
     matcher.appendReplacement(replacement, "");
     // Careful using unknown text in appendReplacement as any "$n" will cause problems
     replacement.append(matcher.group(1).replace("a", "b"));
}    
matcher.appendTail(replacement);
String result = replacement.toString();

Edit: Changed pattern above so that it does not match surrounding <pre> and </pre>.

Sign up to request clarification or add additional context in comments.

6 Comments

You are replacing "a" inside <tag> too.
@AchintyaJha it was my mistake in example. Yes "a" inside <tag> should be replaced
I think your regex will replace "a" in "<pre>x</pre>a<pre>x</pre>"
@Bohemian: I've edited my answer to indicate that the regex is wrong and provided another one
You could greatly simply your code by using look arounds, and you don't need to be non-greedy (OP guarantees no nesting): (?<=<pre>).*(?=</pre>)". Being non-consuming, lookarounds would mean you don't need to replace the consumed tags.
|
0

Here's a regex that will do the job (I think: I wouldn't bet too much on it passing all tests enter image description here )

String replacement = original.replaceAll(
    "(?<=<pre>(?:(?!</pre>).){0,50})a(?=(?:(?!<pre>).)*</pre>)", 
    "b");

Explanation:

  • (?<=<pre>(?:(?!</pre>).){0,50}) - look-behind for a preceding <pre> so long as we don't traverse back over </pre> to find it. Java requires a finite maximum length look-behind so we use {0,50} rather than *.
  • a - The character we want to replace
  • (?=(?:.(?!<pre>))*</pre>) - Look ahead for </pre> so long as we don't traverse past <pre> to find it.

Comments

-1
Pattern pattern = Pattern.compile("<pre>(.+?)</pre>");
java.util.regex.Matcher matcher = pattern.matcher("a <pre> c a <tag> a d </pre> a");

Try this:

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.