2

I'm actually working on a personal project in C# using WPF and WPF WebBrowser. I really need to explore html DOM Elements as we used to do in javascript or php..etc

In my MainWindow I have this variable :

private mshtml.HTMLDocument mainDocument = new mshtml.HTMLDocument();

In my webBrowser LoadComplete callback I have this :

mainDocument = (mshtml.HTMLDocument) mainBrowser.Document;

Ok, so this is nice, it's working.

Now if I do this :

mshtml.IHTMLElement elem = mainDocument.getElementById("MY_ID");

it's also very nice, can do elem.innerHTML or somes stuff like that.

BUT my problem is only HTMLDocument have methodes to find elements by ID, by tagnames..etc

I don't know how to find elements in IHTMLElement. I tried some stuff like casting IHTMLElement to IHTMLElement2..etc but nothing have worked.

Please if you have any ideas. A lot of people talks about hosting winforms webbrowser but I think it must have a way to do that only with mshtml.

Thanks a lot, If you need more information, please feel free to ask me

ps : I'm french so I'm sorry about my Engish skills

2 Answers 2

2

If you want to parse HTML document in Winforms or wpf, you can use an excellent parser htmlagility pack. Refer to below link http://html-agility-pack.net

  var url = "http://html-agility-pack.net/";
 var web = new HtmlWeb();
 var doc = web.Load(url);

After loading it in doc, you can get any attribute, tag, etc.

 var value = doc.DocumentNode
.SelectNodes("//td/input")
.First()
.Attributes["value"].Value;

It's super easy, just explore the doc a bit and you can make full use of it.

You can load html agility pack even from webbrowser, like below

HtmlAgilityPack.HtmlDocument doc = new 
HtmlAgilityPack.HtmlDocument();
            doc.Load(webBrowser1.DocumentStream);

Or you can do like this

HtmlAgilityPack.HtmlDocument doc = new 
HtmlAgilityPack.HtmlDocument();
            doc.Load(webBrowser1.Document);

Thanks

Sign up to request clarification or add additional context in comments.

4 Comments

Sure, but what if we already have a HTML document in the WPF WebBrowser's Document property? Could we possibly access its entire HTML and initialize HtmlAgilityPack from that, without reloading the document?
Yes, you can do that. Below is how you will do it HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument(); doc.Load(webBrowser1.DocumentStream); Or you can do like this HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument(); doc.Load(webBrowser1.Document); I just edited my above answer to include above change.
If an answer helps, do mark it as Answer, this way others can also benefit.
You really would like but I cannot because I'm new member, I will put more details in an answer
0

Thanks a lot @Sujit for your help. I've not enouth reputation to mark your answer as helpful but I hope others will do.

To get it work with wpf webbrowser I've done :

mainHTMLDoc.LoadHtml((mainBrowser.Document as mshtml.HTMLDocument).documentElement.innerHTML);

To manipulate everything in should use this :

using System.Linq;

After that you can do stuffs like that :

var table = mainHTMLDoc.GetElementbyId("MyID");
var rows = table.Element("tbody").Elements("tr");
for(int i=0; i< rows.Count();i++) {
    var datacol1 = rows.ElementAt(i).Elements("td").ElementAt(0).Descendants("a").ElementAt(0).InnerHtml;
    var datacol2 = rows.ElementAt(i).Elements("td").ElementAt(1).InnerText 
}

Whitout using Linq you cannot use Elements function which are very very usefull ! Thanks again Sujit :)

1 Comment

No issues Mickael. All the best!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.