1

This is my code:

private static Regex paginationRegex = new Regex("<div class=\"pagination\">.*?<ul>(?<lis>.*?)</ul></div>",
                            RegexOptions.Singleline | RegexOptions.IgnoreCase);

        static void Main(string[] args)
        {
            string output = File.ReadAllText("output.html");

            var match = paginationRegex.Match(output);

            var lis = match.Groups["lis"].Value;

        }

and this is my HTML in output.html:

<div class="pagination">
        <ul>
                <li><a href="javascript:searchPage('1')" class="arrowDeactiveLeftFirst"> </a></li>  
                            <li><a href="javascript:searchPage('1')" class="deActivateleftArrow"> </a></li>
                    <li>
                                    <a class="current" href="javascript:searchPage('1')">1</a>
                                </li>
          <li>
                                    <a href="javascript:searchPage('2')">2</a> 
                                </li>
          <li>
                                    <a href="javascript:searchPage('3')">3</a> 
                                </li>
                      <li><a href="javascript:searchPage('2')" class="rightArrow"> </a></li>
                          <li><a href="javascript:searchPage('730')" class="arrowRightLast"> </a></li>
              </ul>
      </div>

However the lis group is always empty. What am I missing?

5
  • Do you mean <li> instead of <lis>? Commented May 15, 2014 at 10:58
  • 1
    he means var lis @AndrewWhitaker Commented May 15, 2014 at 11:00
  • 1
    @Neel: Right, I understand that var lis is empty, but the Regex is looking for a tag named <lis>. I'm asking if this should be <li> instead. Commented May 15, 2014 at 11:06
  • 2
    @Andrew and Charlie Hardis: It is called "Named Groupes" I am capturing groups. stackoverflow.com/questions/906493/… Commented May 15, 2014 at 11:09
  • @Jack: You're right. Didn't even know those existed. Commented May 15, 2014 at 11:25

1 Answer 1

1

I think this is just because you're not taking into account the space between the </ul> and the </div> at the end of your snippet. Allowing whitespace in between the two seems to fix the issue:

//                                                                                  \/
Regex paginationRegex = new Regex("<div class=\"pagination\">.*?<ul>(?<lis>.*?)</ul>\\s*</div>",
                        RegexOptions.IgnoreCase | RegexOptions.Singleline);

I'm also obliged to mention that regular expressions often aren't the best tool for parsing HTML. Check out Html Agility Pack for a good library that's great at parsing HTML.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.