4

For some reason this piece of Java code is giving me overlapping matches:

Pattern pat = Pattern.compile("(" + leftContext + ")" + ".*" + "(" + rightContext + ")", Pattern.DOTALL);

any way/option so it avoids detecting overlaps? e.g. leftContext rightContext rightContext should be be 1 match instead of 2

Here's the complete code:

public static String replaceWithContext(String input, String leftContext, String rightContext, String newString){   
  Pattern pat = Pattern.compile("(" + leftContext + ")" + ".*" + "(" + rightContext + ")", Pattern.DOTALL);
  Matcher matcher = pat.matcher(input);
  StringBuffer buffer = new StringBuffer();

  while (matcher.find()) { 
   matcher.appendReplacement(buffer, "");
   buffer.append(matcher.group(1) + newString + matcher.group(2));
  }
  matcher.appendTail(buffer);

  return buffer.toString();
 }

So here's the final answer using a negative lookahead, my bad for not realizing * was greedy:

Pattern pat = Pattern.compile("(" +
    leftContext + ")" + "(?:(?!" +
    rightContext + ").)*" + "(" +
    rightContext + ")", Pattern.DOTALL);
5
  • 1
    Can you tell us what leftContext and rightContext are? And give us an example of a failing match. Commented Nov 27, 2010 at 9:05
  • 1
    Getting the regex matcher to capture overlapping things is usually a slightly tricky matter, not something that happens by default. Without seeing the contents of the patterns, it's hard to say what is happening. It would basically require lookarounds to get the matcher to go over the same part of the string more than once. Are you doing that? Commented Nov 27, 2010 at 13:52
  • rightContext and leftContext are plain Strings e.g leftContext="ab" rightContext="cd" Commented Nov 29, 2010 at 5:12
  • The * quantifier is greedy by default, the regex you describe would not produce multiple matches. Why don't you post a complete example? Commented Nov 29, 2010 at 5:25
  • ah, so thats what happening, any chance i can change the default behavior of * to be non-greedy? Commented Nov 29, 2010 at 8:24

2 Answers 2

2

Your use of the word "overlapping" is confusing. Apparently, what you meant was that the regex is too greedy, matching everything from the first leftContext to the last rightContext. It seems you figured that out already--and came up with a better approach as well--but there's still at least one potential problem.

You said leftContext and rightContext are "plain Strings", by which I assume you meant they aren't supposed to be interpreted as regexes, but they will be. You need to escape them, or any regex metacharacters they contain will cause incorrect results or run-time exceptions. The same goes for your replacement string, although only $ and the backslash have special meanings there. Here's an example (notice the non-greedy .*?, too):

public static String replaceWithContext(String input, String leftContext, String rightContext, String newString){
  String lcRegex = Pattern.quote(leftContext);
  String rcRegex = Pattern.quote(rightContext);
  String replace = Matcher.quoteReplacment(newString);
  Pattern pat = Pattern.compile("(" + lcRegex + ").*?(" + rcRegex + ")", Pattern.DOTALL);

One other thing: if you aren't doing any post-match processing on the matched text, you can use replaceAll instead of rolling your own with appendReplacement and appendTail:

return input.replaceAll("(?s)(" + lcRegex + ")" +
                        "(?:(?!" + rcRegex + ").)*" +
                        "(" + rcRegex + ")",
    "$1" + replace + "$2");
Sign up to request clarification or add additional context in comments.

Comments

1

There are few possibilities, depending on what you really need.

You can append $ at the end of your regex, like this:

"(" + leftContext + ")" + ".*" + "(" + rightContext + ")$"

so if rightContext isn't the last thing, your regex won't match.

Next, you can capture everything after rightContext:

"(" + leftContext + ")" + ".*" + "(" + rightContext + ")(.*)"

and after that discard everything in your third matching group.

But, since we don't know what leftContext and rightContext really are, maybe your problem lies within them.

1 Comment

mm, not sure how that will work in my code, i cannot just discard parts of the input string

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.