In an attempt to answer your actual question from four years ago (at the time of me posting this answer), I'm providing a working solution. I wouldn't be surprised if you found another way to do this, either, so this is mostly for other people searching for a similar solution. Keep in mind, however, that this is considered
- somewhat obsolete (the actual use of
HtmlDocument)
- not the best way to handle HTML DOM parsing (the preferred solution is to use HtmlAgilityPack or CsQuery or some other method using actual parsing and not regular expressions)
- extremely hacky and therefore not the safest/most compatible way to do it
- you really should not be doing what I'm about to show
Additionally, keep in mind that HtmlDocument is really just a wrapper for mshtml.HTMLDocument2, so it is technically slower than just using a COM wrapper directly, but I completely understand the use case simply for ease of coding.
If you're cool with all of the above, here's how to accomplish what you want.
public class HtmlDocumentFactory
{
private static Type htmlDocType = typeof(System.Windows.Forms.HtmlDocument);
private static Type htmlShimManagerType = null;
private static object htmlShimSingleton = null;
private static ConstructorInfo docCtor = null;
public static HtmlDocument Create()
{
if (htmlShimManagerType == null)
{
// get a type reference to HtmlShimManager
htmlShimManagerType = htmlDocType.Assembly.GetType(
"System.Windows.Forms.HtmlShimManager"
);
// locate the necessary private constructor for HtmlShimManager
var shimCtor = htmlShimManagerType.GetConstructor(
BindingFlags.NonPublic | BindingFlags.Instance, null, new Type[0], null
);
// create a new HtmlShimManager object and keep it for the rest of the
// assembly instance
htmlShimSingleton = shimCtor.Invoke(null);
}
if (docCtor == null)
{
// get the only constructor for HtmlDocument (which is marked as private)
docCtor = htmlDocType.GetConstructors(
BindingFlags.NonPublic | BindingFlags.Instance
)[0];
}
// create an instance of mshtml.HTMLDocument2 (in the form of
// IHTMLDocument2 using HTMLDocument2's class ID)
object htmlDoc2Inst = Activator.CreateInstance(Type.GetTypeFromCLSID(
new Guid("25336920-03F9-11CF-8FD0-00AA00686F13")
));
var argValues = new object[] { htmlShimSingleton, htmlDoc2Inst };
// create a new HtmlDocument without involving WebBrowser
return (HtmlDocument)docCtor.Invoke(argValues);
}
}
To use it:
var htmlDoc = HtmlDocumentFactory.Create();
htmlDoc.Write("<html><body><div>Hello, world!</body></div></html>");
Console.WriteLine(htmlDoc.Body.InnerText);
// output:
// Hello, world!
I have not tested this code directly -- I have translated it from an old Powershell script that needed the same functionality you're requesting. If it fails, let me know. The functionality is there but the code might need very minor tweaking to get working.