1

I have a string:

<graphic id="8374932">Translating Cowl (Inner/Outer Bondments</graphic>

And my pattern:

"<graphic id=\"(.*?)\">(.*?)</graphic>"

But it fails for second group, saying: "Not enough )'s." How should I prevent it?

7
  • 2
    It looks like you're trying to parse XML. Would you like help? * Use LINQ to XML (recommended) * Use System.Xml * Use XPathDocument Commented Sep 18, 2011 at 16:21
  • 1
    Using an online regex tester, this works fine. Does the error come from the method that is given the value in .Group[2]? Commented Sep 18, 2011 at 16:22
  • @Austin: Good point, especially since that is the only place where there actually is a missing )... Commented Sep 18, 2011 at 16:27
  • I tested your search expression. It seems to work fine. Group1 ="8374932", Group2="Translating Cowl (Inner/Outer Bondments". Commented Sep 18, 2011 at 16:27
  • 4
    Didn't you accidentally switch input and pattern parameters? Commented Sep 18, 2011 at 17:05

1 Answer 1

10

EDIT: First off, if you goal is to parse HTML or XML I strongly advise against it. If your goal is to learn or to surgically grab an element node then regex may, and I say may be a tool to use. I am answering this with the thought that you are using the html pattern to learn from....

I believe you have confused your data with your pattern and the regex pattern is failing.

I recommend these things

  1. Don't use .*? to get text. It is too nebulous for the regex parser. Be more succinct in your pattern.
  2. Since you know that the text is enclosed in quotes or by >xxx< use those as anchors.
  3. Once anchors are determined extract the text
  4. Place captured text into named capture groups.

How to get the text? Tell the regex parser to get everthing that is not an anchor character by using the set operation with the ^ (which means not when in a set [ ]) such as ([^\"]+) which says match everything that is not a quote.

Change your pattern to this which demonstrates the above suggestions:

string data = @"<graphic id=""8374932"">Translating Cowl (Inner/Outer Bondments</graphic>";

 // \x22 is the hex escape for the quote, makes it easier to read.
string pattern = @"
(?:graphic\s+id=\x22)  # Match but don't capture (MBDC) the beginning of the element
(?<ID>[^\x22]+)        # Get all that is not a quote
(?:\x22>)              # MBDC the quote
(?<Content>[^<+]+)     # Place into the Content match capture group all text that is not + or <  
(?:\</graphic)         # MBDC The graphic";

// Ignore Pattern whitespace only allows us to comment, does not influence regex processing.
var mt = Regex.Match(data, pattern, RegexOptions.IgnorePatternWhitespace);

Console.WriteLine ("ID: {0} Content: {1}", mt.Groups["ID"], mt.Groups["Content"]);

// Outputs:
// ID: 8374932 Content: Translating Cowl (Inner/Outer Bondments

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.