1

I am using regex to figure out what format the input date is. This is one of the patterns i am using

    ^((18[5-9]|19[0-9]|20[0-9])\\d)(0?[1-9]|1[012])(0?[1-9]|[12][0-9]|3[01])$

so the constraint is to have the year between 1850 and 2099. If I pass for instance this string as date 20011212 when I am extracting the year, month and day from it, this is what I get: year: 2001, month: 200, day :12. Any Idea why?

    pattern = Pattern.compile(PATTERN);
    matcher = pattern.matcher(dateString);
    if (matcher.matches()){
       matcher.reset();
       if (matcher.find()){
          Integer.parseInt(matcher.group(1));
          Integer.parseInt(matcher.group(2));
          Integer.parseInt(matcher.group(3));
       }
    }

The code is simplified, but even on this simplified version, it returns erroneous results. Thank you for any suggestions/solutions.

2
  • what results are you getting? Commented Oct 25, 2012 at 9:21
  • year: 2001, month: 200, day :12 Commented Oct 25, 2012 at 9:31

4 Answers 4

4

In a regex everything you put inside (...) is a capturing group. You have two groups in the year, those are both capturing groups:

group(1) = ((18[5-9]|19[0-9]|20[0-9])\\d)
group(2) = (18[5-9]|19[0-9]|20[0-9])
group(3) = (0?[1-9]|1[012])
group(4) = (0?[1-9]|[12][0-9]|3[01])

You can also use non capturing blocks like this: (?:...)

So your pattern should be:

^((?:18[5-9]|19[0-9]|20[0-9])\\d)(0?[1-9]|1[012])(0?[1-9]|[12][0-9]|3[01])$
Sign up to request clarification or add additional context in comments.

3 Comments

The parenthesis are good there, to allow another digit for the year, if not it considers the year on 3 digits.
Yes, but you need to specify that that is a non-capturing group.
+1 You are also right. Thank you for clarifying. I didn't knew the trick for non capturing
4

The second group is the first three digits of the year, use a non capturing group for it:

^((?:18[5-9]|19[0-9]|20[0-9])\\d)(0?[1-9]|1[012])(0?[1-9]|[12][0-9]|3[01])$

Comments

2

Change your regex to ^(18[5-9]\\d|19[0-9]\\d|20[0-9]\\d)(0?[1-9]|1[012])(0?[1-9]|[12][0-9]|3[01])$. You had a pair of parentheses too much around the first 3 digits of the year that created the second matching group.

1 Comment

Yup, you were right. I thought I can put the \\d after all the constraints. Thanks.
1

This is because you have a capturing group (a pair of parentheses) inside the year regex. You can either:

  • count the left brackets and select the correct ones. Hard to maintain if you are ever going to change the regex in the future.
  • use named groups. Not all regex flavors support this. I think Java is one of those who don't.
  • use non-capturing groups.

A non-capturing group is denoted by ?: at the start of the group:

^((?:18[5-9]|19[0-9]|20[0-9])\\d)(0?[1-9]|1[012])(0?[1-9]|[12][0-9]|3[01])$
   ^^--- here

Note that look-arounds ((?= ... ), ...), are non-capturing as well.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.