7

I have a C# application that receives an html file. I want to parse and validate it. On output it will return a list of errors or that my html is valid.

Has anyone any idea how can I do this?

3
  • possible duplicate of What is the best way to parse html in C#? Commented Oct 4, 2010 at 9:11
  • 2
    The validation part of this question makes it quite distinct from questions about simply parsing HTML. Commented Oct 4, 2010 at 9:15
  • That's right, I'm not interested in parsing html, I'm interested in validate it for possible errors. Commented Oct 4, 2010 at 9:32

3 Answers 3

11

I'd run a local instance of the W3C Markup Validation service and communicate with it via the API

Sign up to request clarification or add additional context in comments.

1 Comment

I was not aware there was an API for this, nice find.
3

You can use HTML Tidy. There is a wrapper for .NET called TidyManaged

2 Comments

TidyManaged does not give any functional dll
Some issues were filed about this, including that the file output doesn't even work (and I confirmed it, despite it apparently being patched already). On the issues page is a link to a version by freethenation that works and requires libtidy32.dll and libtidy64.dll, so I followed gcores's link above and renamed the 32 and 64-bit versions appropriately. Took awhile to figure out, so I thought I'd post that here.
1

There is an obscure DLL in the framework version 1.0 (!) Microsoft.mshtml.dll and that is the only way in the framework to deal with DOM. If HTML is XHTML and a valid XML, then you can use XML but otherwise this is the only chance.

4 Comments

I'd be amazed that that was the only way to deal with DOM.
hmmm, explain me how can you can validate an very elaborate html file with xml. I thought about that too, and I think it's not the best way.
In what framework? Nobody mentioned a framework. (Oh, and must we resort to name calling?)
It's not so obscure, it the PIA for Internet Explorer. Not part of the framework, it's a COM interop library. Whether IE is a good validator for HTML is, ahem, debatable.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.