5

I'm currently trying to solve a problem from codingbat.com with regular expressions.

I'm new to this, so step-by-step explanations would be appreciated. I could solve this with String methods relatively easily, but I am trying to use regular expressions.

Here is the prompt: Given a string and a non-empty word string, return a string made of each char just before and just after every appearance of the word in the string. Ignore cases where there is no char before or after the word, and a char may be included twice if it is between two words.

wordEnds("abcXY123XYijk", "XY") → "c13i"
wordEnds("XY123XY", "XY") → "13"
wordEnds("XY1XY", "XY") → "11"

etc

My code thus far:

String regex = ".?" + word+ ".?";
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(str);

String newStr = "";
while(m.find())
    newStr += m.group().replace(word, "");

return newStr;

The problem is that when there are multiple instances of word in a row, the program misses the character preceding the word because m.find() progresses beyond it.

For example: wordEnds("abc1xyz1i1j", "1") should return "cxziij", but my method returns "cxzij", not repeating the "i"

I would appreciate a non-messy solution with an explanation I can apply to other general regex problems.

3
  • See this answer about look-around regular expressions stackoverflow.com/a/2995621/324900 Commented Nov 3, 2012 at 19:14
  • @user1796994 See my undeleted, repaired answer for a one-line solution Commented Nov 4, 2012 at 0:51
  • @user1796994 See my (edited) answer for how to do it in just one line (including test code). You may not consider it "non-messy", but it's sure less messy than a many-line solution IMHO. Commented Nov 4, 2012 at 11:15

3 Answers 3

1

This is a one-liner solution:

String wordEnds = input.replaceAll(".*?(.)" + word + "(?:(?=(.)" + word + ")|(.).*?(?=$|." + word + "))", "$1$2$3");

This matches your edge case as a look ahead within a non-capturing group, then matches the usual (consuming) case.

Note that your requirements don't require iteration, only your question title assumes it's necessary, which it isn't.

Note also that to be absolutely safe, you should escape all characters in word in case any of them are special "regex" characters, so if you can't guarantee that, you need to use Pattern.quote(word) instead of word.

Here's a test of the usual case and the edge case, showing it works:

public static String wordEnds(String input, String word) {
    word = Pattern.quote(word); // add this line to be 100% safe
    return input.replaceAll(".*?(.)" + word + "(?:(?=(.)" + word + ")|(.).*?(?=$|." + word + "))", "$1$2$3");
}

public static void main(String[] args) {
    System.out.println(wordEnds("abcXY123XYijk", "XY"));
    System.out.println(wordEnds("abc1xyz1i1j", "1"));
}

Output:

c13i
cxziij
Sign up to request clarification or add additional context in comments.

3 Comments

This isn't quite right - I'm going to come back to this later
@Bohemian that is incorrect he needs cxziij as output not cxzi..that is the reason y i had used lookarounds...
@Fake.It.Til.U.Make.It Although I previously stated this wasn't a solution, I have figured out the regex that actually (really) works - see edited answer for a fully working one line solution.
0

Use positive lookbehind and postive lookahead which are zero-width assertions

(?<=(.)|^)1(?=(.)|$)
    ^     ^     ^-looks for a character after 1 and captures it in group2
    |     |->matches 1..you can replace it with any word
    |
    |->looks for a character just before 1 and captures it in group 1..this is zero width assertion that doesn't move forward to match.it is just a test and thus allow us to capture the values

$1 and $2 contains your value..Go on finding till the end

So this should be like

String s1 = "abcXY123XYiXYjk";
String s2 = java.util.regex.Pattern.quote("XY");
String s3 = "";
String r = "(?<=(.)|^)"+s2+"(?=(.)|$)";
Pattern p = Pattern.compile(r);
Matcher m = p.matcher(s1);
while(m.find()) s3 += m.group(1)+m.group(2);
//s3 now contains c13iij

works here

7 Comments

-1 Waaaaaay too complicated, and actually wrong. You don't need look arounds! Just use (.) - he says "don't match if there isn't a character", but you're over acheiving by matching start and end, which is actually not what the OP says he wants
@Bohemian I liked your original answer because of its simplicity, so I would appreciate if you could post that (with str.replace)
@Bohemian can u tell even a single case where this regex would fail
@Fake.It.Til.U.Make.It I've taken my prosaic and feel much better now - I've removed my -1. That was a bit harsh. My criticism stands though that's it's too complicated.
@Bohemian How is it too complicated? Matching start and end is needed for cases like 'aaaX', and look arounds are needed for cases like 'aXaXa'. Removing them will stop the edge cases from being handled correctly.
|
0

Use regex as follows:

Matcher m = Pattern.compile("(.|)" + Pattern.quote(b) + "(?=(.?))").matcher(a);
for (int i = 1; m.find(); c += m.group(1) + m.group(2), i++);

Check this demo.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.