0

I need to parse an HTML string that I receive from a server.

 <html>
 <head/>
 <body style="margin: 0;padding: 0">
    <a href="http://itunes.apple.com/WebObjects/MZStore.woa   
/wa/viewSoftware?id=319737742&amp;mt=8&amp;uo=6" style="margin: 0;padding: 0"><img   
src="https://s3.amazonaws.com/sportschatter/postcard.jpg" style="margin: 0;padding: 
0"/></a>
</body>
</html>

This is the response I get from the server. I need to retrieve the img URL https://s3.amazonaws.com/sportschatter/postcard.jpg as well as the href part. I have HTML Agility pack for WP7, but I don't know how to write the query to get this information. I tried something like this:

HtmlAgilityPack.HtmlDocument document = new HtmlAgilityPack.HtmlDocument();
         document.LoadHtml(htmlString);


       var value  =  document.DocumentNode.Descendants("img src").
                                       Select(
                                           x =>
                                           x.InnerText);

This does not give me any value. I also tried Regex:

    string parseString = htmlstring;
        Regex expression = new Regex(@".*img src=(\d+).*$");
        Match match = expression.Match(parseString);
        MessageBox.Show(match.Groups[1].Value); 

but this does not work either. Please let me know what I am doing wrong.

1
  • Also i ahve seen lot of example of html Agility , most of them uses SelectNodes method, which is not present in WP7 verion of the library Commented Nov 8, 2011 at 11:34

2 Answers 2

2

You clearly misunderstood how you're meant to use the LINQ2XML syntax (without XPath, since XPath isn't supported on Windows Phone)

You need to do something like this instead:

var image = document.DocumentNode.Descendants("img").First()
var source = image.GetAttribute("src", "").Value;
Sign up to request clarification or add additional context in comments.

2 Comments

thanks it worked , i have to change it slightly var image = document.DocumentNode.Descendants("img").First(); var source = image.GetAttributeValue("src", ""); how do i get the text values which are generally in p tags
Use the InnerHtml or InnerText property.
-1

Use HtmlAgilityPack - do not use regex.

The 'query string' inside Descendants is an XPath, not CSS-like selector.

Here's an example: http://htmlagilitypack.codeplex.com/wikipage?title=Examples Here's some info about XPath: http://msdn.microsoft.com/en-us/library/ms256086.aspx

2 Comments

HI Jakub , the example link uses doc.DocumentElement.SelectNodes, i am using HAPPhone version for htmlAgility pack, and it doesnt have DocumnentElemnt method.
XPath isn't supported on Windows Phone.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.