27

In the following code:

public static void main(String[] args) {
    List<String> allMatches = new ArrayList<String>();
    Matcher m = Pattern.compile("\\d+\\D+\\d+").matcher("2abc3abc4abc5");
    while (m.find()) {
        allMatches.add(m.group());
    }

    String[] res = allMatches.toArray(new String[0]);
    System.out.println(Arrays.toString(res));
}

The result is:

[2abc3, 4abc5]

I'd like it to be

[2abc3, 3abc4, 4abc5]

How can it be achieved?

4
  • You would need to search starting at every index; use the find(int startingIndex) method and search starting at every character position. Of course, then you're likely to find too many matches... Assuming you want to start at every number, you might try combining an iteration over Matcher.find(String.indexOf(digits, index)) for all matching indices. Commented Jul 31, 2013 at 13:17
  • I suppose if it's single digits, you could back up from the match starting position and find from there for the next match. Commented Jul 31, 2013 at 13:20
  • 1
    For input "12abc13abc14abc15", do you want [12abc13, 2abc13, 13abc14, 3abc14, 14abc15, 4abc15] or [12abc13, 13abc14, 14abc15]? Commented Jul 31, 2013 at 13:22
  • @johnchen902: the later. The solution handles this. Commented Jul 31, 2013 at 13:36

3 Answers 3

18

Make the matcher attempt to start its next scan from the latter \d+.

Matcher m = Pattern.compile("\\d+\\D+(\\d+)").matcher("2abc3abc4abc5");
if (m.find()) {
    do {
        allMatches.add(m.group());
    } while (m.find(m.start(1)));
}
Sign up to request clarification or add additional context in comments.

1 Comment

To the first two up-voter: the ordinary version contains a bug that if nothing matches, an IllegalStateException will be thrown.
16

Not sure if this is possible in Java, but in PCRE you could do the following:
(?=(\d+\D+\d+)).

Explanation
The technique is to use a matching group in a lookahead, and then "eat" one character to move forward.

  • (?= : start of positive lookahead
    • ( : start matching group 1
      • \d+ : match a digit one or more times
      • \D+ : match a non-digit character one or more times
      • \d+ : match a digit one or more times
    • ) : end of group 1
  • ) : end of lookahead
  • . : match anything, this is to "move forward".

Online demo


Thanks to Casimir et Hippolyte it really seems to work in Java. You just need to add backslashes and display the first capturing group: (?=(\\d+\\D+\\d+)).. Tested on www.regexplanet.com:

enter image description here

9 Comments

@anubhava It gives the right results for PCRE. That's what I stated anyways.
Yes I just meant that in Java it didn't give expected results.
Not really working. Use 12abc13abc14abc15 as input and the result is [12abc13, 2abc13, 13abc14, 3abc14, 14abc15, 4abc15] instead of [12abc13, 13abc14, 14abc15]. See my and OP's comments under the question.
@johnchen902 overlapping matches ! That's what he wanted to begin with ... If that's not the case then (?=(\d+\D+\d+))\d+ would do the job
@johnchen902: right you must replace the pattern by (?=((?<!\\d)\\d+\\D+\\d+)) if you don't want overlapped results in your overlapped results. :)
|
3

The above solution of HamZa works perfectly in Java. If you want to find a specific pattern in a text all you have to do is:

String regex = "\\d+\\D+\\d+";

String updatedRegex = "(?=(" + regex + ")).";

Where the regex is the pattern you are looking for and to be overlapping you need to surround it with (?=(" at the start and ")). at the end.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.