1

According to Regex documentation, using RegexOptions.ExplicitCapture makes the Regex only match named groups like (?<groupName>...); but in action it does something a little bit different.

Consider these lines of code:

static void Main(string[] args) {
    Regex r = new Regex(
        @"(?<code>^(?<l1>[\d]{2})/(?<l2>[\d]{3})/(?<l3>[\d]{2})$|^(?<l1>[\d]{2})/(?<l2>[\d]{3})$|(?<l1>^[\d]{2}$))"
        , RegexOptions.ExplicitCapture
    );
    var x = r.Match("32/123/03");
    r.GetGroupNames().ToList().ForEach(gn => {
        Console.WriteLine("GroupName:{0,5} --> Value: {1}", gn, x.Groups[gn].Success ? x.Groups[gn].Value : "");
    });
}

When you run this snippet you'll see the result contains a group named 0 while I don't have a group named 0 in my regex!

GroupName:    0 --> Value: 32/123/03  
GroupName: code --> Value: 32/123/03  
GroupName:   l1 --> Value: 32  
GroupName:   l2 --> Value: 123  
GroupName:   l3 --> Value: 03  
Press any key to continue . . .  

Could somebody please explain this behavior to me?

2
  • 2
    The zeroth group matches the entire regexp Commented Apr 1, 2015 at 18:56
  • @AlexK. Do you mean I just have to ignore the first group? Commented Apr 1, 2015 at 19:07

2 Answers 2

1

You always have group 0: that's the entire match. Numbered groups are relative to 1 based on the ordinal position of the opening parenthesis that defines the group. Your regular expression (formatted for clarity):

(?<code>
  ^
  (?<l1> [\d]{2} )
  /
  (?<l2> [\d]{3} )
  /
  (?<l3> [\d]{2} )
  $
|
  ^
  (?<l1>[\d]{2})
  /
  (?<l2>[\d]{3})
  $
|
   (?<l1> ^[\d]{2} $ )
)

Your expression will backtrack, so you might consider simplifying your regular expression. This is probably clearer and more efficient:

static Regex rxCode = new Regex(@"
  ^                    # match start-of-line, followed by
  (?<code>             # a mandatory group ('code'), consisting of
    (?<g1> \d\d )      # - 2 decimal digits ('g1'), followed by
    (                  # - an optional group, consisting of
      /                #   - a literal '/', followed by
      (?<g2> \d\d\d )  #   - 3 decimal digits ('g2'), followed by
      (                #   - an optional group, consisting of
        /              #     - a literal '/', followed by
        (?<g3> \d\d )  #     - 2 decimal digits ('g3')
      )?               #     - END: optional group
    )?                 #   - END: optional group
  )                    # - END: named group ('code'), followed by
  $                    # - end-of-line
" , RegexOptions.IgnorePatternWhitespace|RegexOptions.ExplicitCapture );

Once you have that, something like this:

string[] texts = { "12" , "12/345" , "12/345/67" , } ;

foreach ( string text in texts )
{
  Match m = rxCode.Match( text ) ;
  Console.WriteLine("{0}: match was {1}" , text , m.Success ? "successful" : "NOT successful" ) ;
  if ( m.Success )
  {
    Console.WriteLine( "  code: {0}" , m.Groups["code"].Value ) ;
    Console.WriteLine( "  g1: {0}" , m.Groups["g1"].Value ) ;
    Console.WriteLine( "  g2: {0}" , m.Groups["g2"].Value ) ;
    Console.WriteLine( "  g3: {0}" , m.Groups["g3"].Value ) ;
  }
}

produces the expected

12: match was successful
  code: 12
  g1: 12
  g2:
  g3:
12/345: match was successful
  code: 12/345
  g1: 12
  g2: 345
  g3:
12/345/67: match was successful
  code: 12/345/67
  g1: 12
  g2: 345
  g3: 67
Sign up to request clarification or add additional context in comments.

1 Comment

+1 and thanks for the cleaner version of my regex. I knew It could be expressed in a cleaner manner, but since it worked and I was lazy, I just kept it that way! I'll use your version of regex thankfully and I'll ignore the 0 group.
0

named group

^(?<l1>[\d]{2})/(?<l2>[\d]{3})/(?<l3>[\d]{2})$|^(?<l1>[\d]{2})/(?<l2>[\d]{3})$|(?<l1>^[\d]{2}$)

enter image description here

try this (i remove first group from your regex) - see demo

3 Comments

It's still the same. The 0 group is up there; and on a side note, I need that code group to be captured. GroupName: 0 --> Value: 32/123/03 GroupName: l1 --> Value: 32 GroupName: l2 --> Value: 123 GroupName: l3 --> Value: 03 Press any key to continue . . .
Pattern "\d+" for text "123" - return array with 1 group = 123. Pattern "(\d+)" for text "123" - return array with 2 groups = 123 and 123. Pattern "(?<digit>\d+)" for text "123" - return array with 2 groups too = 123 and 123. I think this is as it should be.
\d+ is not named. (?<some-name>\d+) would be named and this acts the same. I think there's something wrong with GetGroupNames() method and its interpretation of RegexOptions.ExplicitCapture.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.