'object reference not set to an instance of an object' for HTMLWorker parser

Question

Document document = new Document(PageSize.LETTER, 10, 10, 10, 10);
StringReader reader = new StringReader(edittedHTML);
HTMLWorker worker = new HTMLWorker(document);
string fileName = "test.pdf";
PdfWriter.GetInstance(document, new FileStream(fileName, FileMode.Create));
document.Open();
worker.Parse(reader);
worker.EndDocument();
worker.Close();
document.Close();

When the program runs to worker.Parse, it throws out an error just like the title said.

The edtted HTML is the HTML string of an HTML page.

Anyone know how to solve this, or what is going wrong?

The stack trace:

at iTextSharp.text.html.simpleparser.HTMLWorker.StartElement(String tag, IDictionary`2 attrs)
at iTextSharp.text.xml.simpleparser.SimpleXMLParser.ProcessTag(Boolean start)
at iTextSharp.text.xml.simpleparser.SimpleXMLParser.Go(TextReader reader)
at iTextSharp.text.xml.simpleparser.SimpleXMLParser.Parse(ISimpleXMLDocHandler doc, ISimpleXMLDocHandlerComment comment, TextReader r, Boolean html)
at iTextSharp.text.html.simpleparser.HTMLWorker.Parse(TextReader reader)
at TestPdfApplication.Form1.button1_Click(Object sender, EventArgs e) in C:\Users\TLiu\Documents\Visual Studio 2010\Projects\TestPdfApplication\TestPdfApplication\Form1.cs:line 68

Debug and see if your worker object is actually instantiated. — neo
– neo, Commented May 28, 2013 at 20:37
It's probably broken in a way that Chrome knows how to handle, but that iTextSharp doesn't. Show us the HTML. Running it through HTMLAgilityPack may help. — SLaks
– SLaks, Commented May 29, 2013 at 15:02
By necessity (unfortunately) browsers have to cope with all sorts of broken html. A lot of tools that process (X)HTML are not always so lenient. There are tools if I'm not mistaken that will "fix" this (X)HTML by applying some of the same rules that browsers use and adding or modifying the HTML to be "correct" according to how a browser would interpret it, but I'm unable to come up with specific names of those tools. — Lasse V. Karlsen
– Lasse V. Karlsen, Commented Jul 9, 2015 at 14:03
The question is widely applicable to a large audience - the HTMLWorker this questions focuses on has been deprecated a long time ago due to numerous issues. Thus, that large audience had better switch to using the replacement class XmlWorker. — mkl
– mkl, Commented Jul 9, 2015 at 20:17

Feras Salim · Accepted Answer · 2015-07-09 20:52:54Z

3

+25

I think the problem is a null reference exception being thrown due to HTML tags that the parser was unable to handle. try to remove the tags although HTMLWorker is no longer supported. It's discontinued in favor of XML Worker

edited Jul 9, 2015 at 20:52

answered Jul 9, 2015 at 20:43

Feras Salim

4387 silver badges33 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Ignacio · Accepted Answer · 2015-07-15 14:50:44Z

1

It looks like a null reference. Try to use the sintaxis "using" with all the IDisposable items:

using (HTMLWorker worker = new HTMLWorker(document))
                { (......) }

answered Jul 15, 2015 at 14:50

Ignacio

8382 gold badges13 silver badges30 bronze badges

1 Comment

Qhori Over a year ago

its not IDisposable

Collectives™ on Stack Overflow

'object reference not set to an instance of an object' for HTMLWorker parser

2 Answers 2

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related