0

Using java.util.regex (jdk 1.6), the regular expression 201210(\d{5,5})Test applied to the subject string 20121000002Test only captures group(0) and does not capture group(1) (the pattern 00002) as it should, given the code below:

Pattern p1 = Pattern.compile("201210(\\d{5,5})Test");
Matcher m1 = p1.matcher("20121000002Test");

if(m1.find()){

    for(int i = 1; i<m1.groupCount(); i++){         
    System.out.println("number = "+m1.group(i));            
    }
}

Curiously, another similar regular expression like 201210(\d{5,5})Test(\d{1,10}) applied to the subject string 20121000002Test0000000099 captures group 0 and 1 but not group 2.

On the contrary, by using JavaScript's RegExp object, the exact same regular expressions applied to the exact same subject strings captures all groups, as one could expect. I checked and re-checked this fact on my own by using these online testers:

Am I doing something wrong here? Or is it that Java's regex library really sucks?

1
  • 2
    If you add / at the beginning and end of your RegExp in JavaScript, it returns a single group. Are you really sure about what you're saying? Have you prepared your own JavaScript test for this (i.e. no online editors)? Commented Oct 20, 2012 at 15:39

5 Answers 5

1

m1.groupCount() returns the number of capturing groups, ie. 1 in your first case so you won't enter in this loop for(int i = 1; i<m1.groupCount(); i++)

It should be for(int i = 1; i<=m1.groupCount(); i++)

Sign up to request clarification or add additional context in comments.

1 Comment

Thank you folks for all the answers! I just cannot believe it. It never crossed my mind that groupCount() would not include group 0, unlike javascript's Regexp exec(). It does not make much sense for me because, after all, group 0 is a damn group! Anyway, I guess I should've debuged the code in more depth...
1

Change the line

for(int i = 1; i<m1.groupCount(); i++){     

to

for(int i = 1; i<=m1.groupCount(); i++){      //NOTE THE = ADDED HERE    

It now works as a charm!

1 Comment

Thank you folks for all the answers! I just cannot believe it. It never crossed my mind that groupCount() would not include group 0, unlike javascript's Regexp exec(). It does not make much sense for me because, after all, group 0 is a damn group! Anyway, I guess I should've debuged the code in more depth...
0

From java.util.regex.MatchResult.groupCount:

Group zero denotes the entire pattern by convention. It is not included in this count.

So iterate through groupCount() + 1.

2 Comments

No, it's just groupCount(). The problem is that he's only going up to groupCount() - 1 now.
@Alan Same thing. It should be while i < groupCount() + 1 or while i <= groupCount(). Arguing correctness beyond that is silly. (I favor the former because <= is easy to miss in loop conditions.)
0

the regular expression "201210(\d{5,5})Test" applied to the subject string "20121000002Test" only captures group(0) and does not capture group(1)

Well I can say I didn't read the manual either but if you do it says for Matcher.groupCount()

Returns the number of capturing groups in this matcher's pattern. Group zero denotes the entire pattern by convention. It is not included in this count.

1 Comment

In java, group(0) means everything, and group(1) means first capture group.
0
for (int i = 1; i <= m1.groupCount(); i++) { 
                   ↑
              your problem

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.