I am having a problem reading some values from a HTML string using the HTMLAgilityPack.
The Two Items i want to read are Newspaper : 82548828 and Fish : 8545852485
But using the code i have wrote so far i can only ever get back the Newspaper item.
I suspect the XPATH i am using is not fully correct, i think the XPATH for the first loop is corrrect as this gives me back the two
I want my second loop to loop over these two items (it thinks there are 6???)
Also is div2.SelectSingleNode(sXPathT); the correct way to extract the groupLabel? or is there a better way?
Thanks
Full Test Code Below
string strTestHTML = @"<div class=\""content\"" data-id=\""123456789\"">" +
" <div class=\"m-group item\">" +
" <span class=\"group\">" +
" <a href=\"javascript:void(0);\">" +
" <span class=\"group-label\">Newspaper </span>" +
" <span class=\"group-value\">82548828</span>" +
" </a>" +
" </span>" +
" <span class=\"group\">" +
" <a href=\"javascript:void(0);\">" +
" <span class=\"group-label\">Fish </span>" +
" <span class=\"group-value\">8545852485</span>" +
" </a>" +
" </span>" +
" </div>" +
"</div>";
//<div class="content" data-id="123456789">
string sNewXpath = "//div[contains(@class,'content') and contains(@data-id, '" + "123456789" + "')]";
//<div class="m-group item">
string sSecondXPath = "/div[contains(@class,'m-group item')]";
//<span class="group"
string sThirdXPath = "//span[contains(@class,'group')]";
string sXPathT = "//span[contains(@class,'group-label')]";
string sXPathO = "//span[contains(@class,'group-value')]";
HtmlAgilityPack.HtmlDocument Doc = new HtmlDocument();
Doc.LoadHtml(strTestHTML);
foreach (HtmlNode div in Doc.DocumentNode.SelectNodes(sNewXpath + sSecondXPath))
{
foreach (HtmlNode div2 in div.SelectNodes(sThirdXPath))
{
var vOddL = div2.SelectSingleNode(sXPathT);
var vOddP = div2.SelectSingleNode(sXPathO);
string GroupLabel = vOddL.InnerText.Trim();
string GroupValue = vOddP.InnerText.Trim();
}
}
EDIT:
Worked out why i was getting 6 items back in the forloop
sThirdXPath was : string sThirdXPath = "//span[contains(@class,'group')]";
should be:
string sThirdXPath = "//span[@class='group']";
Still trying to find the right way to interrogate the HTMLNode contained in div2 to find the values of interest. I assume it needs XPath to match iinside the current node, not HTML document wide.
Updated HTML Sample:
<div class="content" data-id="123456789">
<div class="m-group item">
<span class="group">
<a href="javascript:void(0);">
<span class="group-label">Newspaper </span>
<span class="group-value">82548828</span>
</a>
</span>
<span class="group">
<a href="javascript:void(0);">
<span class="group-label">Fish </span>
<span class="group-value">8545852485</span>
</a>
</span>
</div>
</div>
<div class="content" data-id="987654321">
<div class="m-group item">
<span class="group">
<a href="javascript:void(0);">
<span class="group-label">Bread</span>
<span class="group-value">82548828</span>
</a>
</span>
<span class="group">
<a href="javascript:void(0);">
<span class="group-label">Milk </span>
<span class="group-value">8545852485</span>
</a>
</span>
</div>
</div>
In the above example what is the correct XPATH to access Just Bread and Its Value and Milk and its Value. I assume i need to filter on data-id="987654321 in the XPath?