3

I'm looking to parse html using .net for the purposes of testing or asserting its content. i.e.

HtmlDocument doc = GetDocument("some html") List forms = doc.Forms() Link link = doc.GetLinkByText("New Customer")

the idea is to allow people to write tests in c# similar to how they do in webrat (ruby).

i.e.

visits('\') fills_in "Name", "mick" clicks "save"

I've seen the html agility pack, sgmlreader etc but has anyone created an object model for this, i.e. a set of classes representing the html elements, such as form, button etc??

Cheers.

3
  • 2
    Html Agility Pack seems very suited for your purpose, but you will have to use XPATH to query the XML. Commented Jul 26, 2009 at 12:43
  • did u actually read the question??? Commented Jul 26, 2009 at 16:45
  • John Saunders pointed out to me that he wants an alternative to Html Agility Pack, but I seems to me that it is very suited for the purpose, and I wanted to point that out. Commented Jul 27, 2009 at 13:10

5 Answers 5

1

Here is good library for html parsing, objects like HtmlButton , HtmlInput s are not created but it is a good point to start and to create them yourself if you don't want to use HTML DOM

Sign up to request clarification or add additional context in comments.

Comments

0

The closest thing to an HTML DOM in .NET, as far as I can tell, is the HTML DOM.

You can use the Windows Forms WebBrowser control, load it with your HTML, then access the DOM from the outside.

BTW, this is .NET. Any code that works for VB.NET would work for C#.

4 Comments

i'd rather not start hosting UI controls for this, then i'll get in to the usual threading issues with UI control, plus performance will suffer, i'm using this for testing asp.net mvc pages and am avoiding selenium etc because of the browser overhead. what would be ideal would be something like HtmlUnit (java based). not sure if i'd have the time to port it as its a monster, it also supports javascript but i dont need it to test my apps (i.e. unobtrusive).
From HmlUnit: final WebClient webClient = new WebClient(); final HtmlPage page = webClient.getPage("htmlunit.sourceforge.net"); final HtmlDivision div = page.getHtmlElementById("some_div_id"); final HtmlAnchor anchor = page.getAnchorByName("anchor_name"); htmlunit.sourceforge.net
Not much formatting in comments. Surround with underscores or single asterisks or double asterisks or backQuotes<T> or maybe triple asterisks. But it's limited and meant to be that way.
Benefit of WebBrowser control - it's IE. It will behave like IE does. This would be important for AJAX scenarios or any other situation where some of the HTML is produced on the fly. You can actually find elements and invoke their click methods, to fire the JavaScript that would run if in a normal browser.
0

you have 2 major options:

  1. Use some browser engine (i.e. internet explorer) that will parse the html for u and then will give give u access to the generated DOM. this option will require u to hvae some interop with the browser engine (in the case of i.e. it's simple COM)

  2. use some light weight parser like HtmlAgilityPack

1 Comment

-1: 1. That's what I answered 15 minutes earlier. 2. Read the question. He knows about the HtmlAgilityPack and doesn't want it.
0

It sounds to me like you are trying to do HTML unit tests. Have you looked into Selenium? It even has C# library so that you can write your HTML unit tests in C# and assert that elements exist and that they have the correct values and even click on links. It even works with JavaScript / AJAX sites.

1 Comment

its too slow for what i want. basically in rails i use webrat for the majority of my acceptance testing, its an inmemory browser (basically a html parser), because of that its very fast, then i may use watir/selenium etc for a smoke test but its v slow so i dont want to use it for everything.
0

The best parser for HTML is the HTQL COM. Use can use HTQL queries to retrieve HTML content.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.