2

I obtain a webpage's html code (as a string) using a WebClient.

However I want to turn it into an HtmlDocument object so I can use the DOM features this class offers. Currently the only way I know how to do it - is using a Browser control as follows:

            string pageHtml = client.DownloadString(url);

            browser.ScriptErrorsSuppressed = true;

            browser.DocumentText = pageHtml;

            do
            {
                Application.DoEvents();

            } while (browser.ReadyState != WebBrowserReadyState.Complete);

            return browser.Document;

Is there another way of doing it? I know there are other browser controls avaliable, but is there a simpler way?

3 Answers 3

7

You can use HtmlAgilityPack .... For example:

HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(html);

var results = doc.DocumentNode
    .Descendants("div")
    .Select(n => n.InnerText);
Sign up to request clarification or add additional context in comments.

1 Comment

I will consider that, however I already have a lot of code which uses the HtmlDocument class, and I was more looking for something I could plug-in without needing to change everything.
2

I know this is an old post but my repl is for others who come here like me

If you want to do it using code .NET here is what you have to do

public System.Windows.Forms.HtmlDocument GetHtmlDocument(string html)
        {
            WebBrowser browser = new WebBrowser();
            browser.ScriptErrorsSuppressed = true;
            browser.DocumentText = html;
            browser.Document.OpenNew(true);
            browser.Document.Write(html);
            browser.Refresh();
            return browser.Document;
        }

Comments

0

I Know it is an old topic, my solution:

public static class HtmlHelpr{

        public static HtmlDocument HtmlDocumentFromFile(this string PathToHtml){
            using(WebBrowser wb = new WebBrowser()){            
                string s = File.ReadAllText(PathToHtml);
                wb.ScriptErrorsSuppressed = true;
                wb.DocumentText = s;
                var hd = wb.Document;
                hd.Write(s);
                return  hd;
            }
        }
    }

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.