1

I'm having trouble getting the right group of a regex match. My code boils down to following:

Pattern fileNamePattern = Pattern.compile("\\w+_\\w+_\\w+_(\\w+)_(\\d*_\\d*)\\.xml");
Matcher fileNameMatcher = fileNamePattern.matcher("test_test_test_test_20110101_0000.xml");

System.out.println(fileNameMatcher.groupCount());

if (fileNameMatcher.matches()) {
    for (int i = 0; i < fileNameMatcher.groupCount(); ++i) {
        System.out.println(fileNameMatcher.group(i));
    }
}

I expect the output to be:

2
test
20110101_0000

However its:

2
test_test_test_test_20110101_0000.xml
test

Does anyone have an explanation?

4 Answers 4

6

Group(0) is the whole match, and group(1), group(2), ... are the sub-groups matched by the regular expression.
Why do you expect "test" to be contained in your groups? You didn't define a group to match test (your regex contains only the group \d*_\d*).

Sign up to request clarification or add additional context in comments.

2 Comments

Thank you very much! Was not aware of that default behavior.
And yes I missed a pair of parenthesis in my code, just edited it for correctness.
2

Group 0 is the whole match. Real groups start with 1, i.e. you need this:

System.out.println(fileNameMatcher.group(i + 1)); 

1 Comment

Thank you very much! Was not aware of that default behavior.
2
  • group(0) should be the entire match ("test_test_test_test_20110101_0000.xml");
  • group(1) should be the sole capture group in your regex ("20110101_0000").

This is what I am getting. I am puzzled as to why you'd be getting a different value for group(1).

1 Comment

Thanks I got it now :) Sorry for the confusion I forgot some parenthesis in my code, see my edit.
2

actually your for loop should INCLUDE groupCount() using "<=" :

for (int i = 0; i <= fileNameMatcher.groupCount(); ++i) {
    System.out.println(fileNameMatcher.group(i));
}

thus your output then will be:

2
test_test_test_test_20110101_0000.xml
test
20110101_0000

the groupCount() will not count group 0 matching the whole string.

first group will be "test" as matched by (\w+) and

second group will be "20110101_0000" as matched by (\d*_\d*)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.