294

I'm having a hard time finding a good resource that explains how to use Named Capturing Groups in C#. This is the code that I have so far:

string page = Encoding.ASCII.GetString(bytePage);
Regex qariRegex = new Regex("<td><a href=\"(?<link>.*?)\">(?<name>.*?)</a></td>");
MatchCollection mc = qariRegex.Matches(page);
CaptureCollection cc = mc[0].Captures;
MessageBox.Show(cc[0].ToString());

However this always just shows the full line:

<td><a href="/path/to/file">Name of File</a></td> 

I have experimented with several other "methods" that I've found on various websites but I keep getting the same result.

How can I access the named capturing groups that are specified in my regex?

3
  • 3
    Backreference should be in the format (?<link>.*) and not (?<link>.*?) Commented May 25, 2009 at 14:05
  • 14
    FYI: If you are trying to store a named capture group inside an xml file then the <> will break it. You can use (?'link'.*) instead in this case. Not entirely relevant to this question but I landed here from a Google search of ".net named capture groups" so I'm sure other people are as well... Commented Apr 13, 2011 at 11:45
  • 1
    StackOverflow link with nice example: stackoverflow.com/a/1381163/463206 Also, @rtpHarry, No the <> will not break it. I was able to use the myRegex.GetGroupNames() collection as the XML element names. Commented Jun 29, 2012 at 17:23

7 Answers 7

298

Use the group collection of the Match object, indexing it with the capturing group name, e.g.

foreach (Match m in mc){
    MessageBox.Show(m.Groups["link"].Value);
}
Sign up to request clarification or add additional context in comments.

1 Comment

Don't use var m, since that would be an object.
127

You specify the named capture group string by passing it to the indexer of the Groups property of a resulting Match object.

Here is a small example:

using System;
using System.Text.RegularExpressions;

class Program
{
    static void Main()
    {
        String sample = "hello-world-";
        Regex regex = new Regex("-(?<test>[^-]*)-");

        Match match = regex.Match(sample);

        if (match.Success)
        {
            Console.WriteLine(match.Groups["test"].Value);
        }
    }
}

Comments

11

The following code sample, will match the pattern even in case of space characters in between. i.e. :

<td><a href='/path/to/file'>Name of File</a></td>

as well as:

<td> <a      href='/path/to/file' >Name of File</a>  </td>

Method returns true or false, depending on whether the input htmlTd string matches the pattern or no. If it matches, the out params contain the link and name respectively.

/// <summary>
/// Assigns proper values to link and name, if the htmlId matches the pattern
/// </summary>
/// <returns>true if success, false otherwise</returns>
public static bool TryGetHrefDetails(string htmlTd, out string link, out string name)
{
    link = null;
    name = null;

    string pattern = "<td>\\s*<a\\s*href\\s*=\\s*(?:\"(?<link>[^\"]*)\"|(?<link>\\S+))\\s*>(?<name>.*)\\s*</a>\\s*</td>";

    if (Regex.IsMatch(htmlTd, pattern))
    {
        Regex r = new Regex(pattern,  RegexOptions.IgnoreCase | RegexOptions.Compiled);
        link = r.Match(htmlTd).Result("${link}");
        name = r.Match(htmlTd).Result("${name}");
        return true;
    }
    else
        return false;
}

I have tested this and it works correctly.

2 Comments

Thanks for reminding me that curly braces can access the groups. I prefer to stick to ${1} to keep things even simpler.
This completely answers the question, but has some problems that are too long to explain in here, but I explained and corrected those in my answer below
3

Additionally if someone have a use case where he needs group names before executing search on Regex object he can use:

var regex = new Regex(pattern); // initialized somewhere
// ...
var groupNames = regex.GetGroupNames();

Comments

3

This answers improves on Rashmi Pandit's answer, which is in a way better than the rest because that it seems to completely resolve the exact problem detailed in the question.

The bad part is that is inefficient and not uses the IgnoreCase option consistently.

Inefficient part is because regex can be expensive to construct and execute, and in that answer it could have been constructed just once (calling Regex.IsMatch was just constructing the regex again behind the scene). And Match method could have been called only once and stored in a variable and then linkand name should call Result from that variable.

And the IgnoreCase option was only used in the Match part but not in the Regex.IsMatch part.

I also moved the Regex definition outside the method in order to construct it just once (I think is the sensible approach if we are storing that the assembly with the RegexOptions.Compiled option).

private static Regex hrefRegex = new Regex("<td>\\s*<a\\s*href\\s*=\\s*(?:\"(?<link>[^\"]*)\"|(?<link>\\S+))\\s*>(?<name>.*)\\s*</a>\\s*</td>",  RegexOptions.IgnoreCase | RegexOptions.Compiled);

public static bool TryGetHrefDetails(string htmlTd, out string link, out string name)
{
    var matches = hrefRegex.Match(htmlTd);
    if (matches.Success)
    {
        link = matches.Result("${link}");
        name = matches.Result("${name}");
        return true;
    }
    else
    {
        link = null;
        name = null;
        return false;
    }
}

Comments

0

A quick guide for regexes in .NET is available here:

https://learn.microsoft.com/en-us/dotnet/standard/base-types/regular-expression-language-quick-reference

Access regex matches is done via groups and captures.

Example of extension method for access of all capture values inside matches below.

public static class MatchCollectionExtensions{
    
    public static IEnumerable<string> GetCapturedValues(this MatchCollection matches){
        foreach (Match match in matches){
            foreach (Group group in match.Groups){
                foreach (Capture capture in group.Captures){
                    yield return capture?.Value;
                }
            }
        }
    }
    
}

Also, using Linqpad is a great resource for learning stuff in C#.

Using the Dump method will show the structure of objects.

Example from the question sample code below.

string page = """
<td><a href="/path/to/file">Name of File</a></td> 
""";
Regex qariRegex = new Regex("<td><a href=\"(?<link>.*?)\">(?<name>.*?)</a></td>");
MatchCollection mc = qariRegex.Matches(page);
CaptureCollection cc = mc[0].Captures;

mc.Dump();

//mc[0].Groups[1].Captures[0].Value.Dump();
//mc[0].Groups[2].Captures[0].Value.Dump();

foreach (var element in mc.GetCapturedValues())
{
    Console.WriteLine(element);
}

Linqpad matches and groups and captures

Output of your regex using extension method gave the following result after iterating and running Console.WriteLine :

<td><a href="/path/to/file">Name of File</a></td>
/path/to/file
Name of File

Adjusting the extension method to instead build a Dictionary of Group name as key and capture values inside should be fairly straightforward, for example creating a key in Dictionary concatenating Group name with capture index and then using capture value as the value of dictionary entry.

1 Comment

No, I wrote the extension method myself inside Linqpad. Yes, it only outputs the captured values if any and ignores the name of them. But as my answer texts, you could build a dictionary instead and use that group name as a key identifier, considering to concatenate group key name with capture index inside the group.
0

I found this question when I wanted to iterate over only the explicit group names but not the numbers. I used the group names for replacement data.

For specific group names, Paolo Tedesco and Andrew Hare gave already the solution

Match match = regexPattern.Match(page);
Group capturedLinkGroup = match.Groups["link"];
…

System.Text.RegularExpressions.Match.Groups

and Rashmi Pandit gave the solution

Match match = regexPattern.Match(page);
string capturedLink = match.Result("${link}");

However, for iterating over the names (not the numbers), the Regex.GetGroupNames() is inappropriate. It gives the names (and number strings) of all capturing groups.

This is even the case, when using RegexOptions.ExplicitCapture which retains the named groups and the whole match as group "0".

Therefore, I used

string[] groupNames = regexPattern.GetGroupNames()
        .Where(name => !int.TryParse(name, out _));

(for defensive programming)

but smaller coupled code can look like this as well:

Regex regexPattern = new Regex(…, RegexOptions.ExplicitCapture);
…
string[] groupNames = regexPattern.GetGroupNames()[1 ..];

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.