1

I need to get numeric values from a web page into two variables.

A snippet from the page is below

<b>Downloads (current version):</b> 123                  <br />
<b>Downloads (total):</b> 253</td>
<br />

The "Downloads (current version):" and "Downloads (total):" are unique strings in the page.

I need to get the "123" and "253" into variables

Edit: Thanks to har07 I ended up with

var downloadscurrentversion = htmlDoc.DocumentNode.SelectSingleNode(@"//b[.='Downloads (current version):']/following-sibling::text()[1]");
var downloadsallversions = htmlDoc.DocumentNode.SelectSingleNode(@"//b[.='Downloads (total):']/following-sibling::text()[1]");

Console.WriteLine("Total: " + downloadsallversions.InnerText.Trim());
Console.WriteLine("Current: " + downloadscurrentversion.InnerText.Trim());

1 Answer 1

1

Check this example :

var html = @"<div>
<b>Downloads (current version):</b> 123                  <br />
<b>Downloads (total):</b> 253</td>
<br />
</div>";
var htmlDoc = new HtmlDocument();
htmlDoc.LoadHtml(html);
var result = htmlDoc.DocumentNode.SelectNodes("/div/text()[normalize-space(.)]");
foreach (var r in result)
{
    Console.WriteLine(r.InnerText.Trim());
}

this part of XPath from above example :

/div/text()

means, select all text nodes those are direct child of <div> element. And the last part :

[normalize-space(.)]

filters out empty text nodes.

UPDATE :

Responding to your comment, you can try this way instead :

var result = 
        htmlDoc.DocumentNode
               .SelectNodes(@"/div/b[.='Downloads (current version):' 
                                        or 
                                     .='Downloads (total):']/following-sibling::text()[1]");

Above XPath selects text node that is directly after specific <b> elements.

Sign up to request clarification or add additional context in comments.

3 Comments

Your code finds the 123 and 253 but I only included a small snippet of the complete HTML page. If I run it on the actual page I get a lot of extra results I don't want. Somehow the XPath needs to look specifically for the "Downloads (current version):" and "Downloads (total):" text and only return results after finding each of them.
updated my answer to select text node based on previous <b> elements.
Thanks, I ended up with var downloadscurrentversion = htmlDoc.DocumentNode.SelectNodes(@"//b[.='Downloads (current version):']/following-sibling::text()[1]"); var downloadsallversions = htmlDoc.DocumentNode.SelectNodes(@"//b[.='Downloads (total):']/following-sibling::text()[1]"); Console.WriteLine("Total: " + downloadsallversions.First().InnerText.Trim()); Console.WriteLine("Current: " + downloadscurrentversion.First().InnerText.Trim());

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.