6
Document document = new Document(PageSize.LETTER, 10, 10, 10, 10);
StringReader reader = new StringReader(edittedHTML);
HTMLWorker worker = new HTMLWorker(document);
string fileName = "test.pdf";
PdfWriter.GetInstance(document, new FileStream(fileName, FileMode.Create));
document.Open();
worker.Parse(reader);
worker.EndDocument();
worker.Close();
document.Close();

When the program runs to worker.Parse, it throws out an error just like the title said.

The edtted HTML is the HTML string of an HTML page.

Anyone know how to solve this, or what is going wrong?

The stack trace:

at iTextSharp.text.html.simpleparser.HTMLWorker.StartElement(String tag, IDictionary`2 attrs)
at iTextSharp.text.xml.simpleparser.SimpleXMLParser.ProcessTag(Boolean start)
at iTextSharp.text.xml.simpleparser.SimpleXMLParser.Go(TextReader reader)
at iTextSharp.text.xml.simpleparser.SimpleXMLParser.Parse(ISimpleXMLDocHandler doc, ISimpleXMLDocHandlerComment comment, TextReader r, Boolean html)
at iTextSharp.text.html.simpleparser.HTMLWorker.Parse(TextReader reader)
at TestPdfApplication.Form1.button1_Click(Object sender, EventArgs e) in C:\Users\TLiu\Documents\Visual Studio 2010\Projects\TestPdfApplication\TestPdfApplication\Form1.cs:line 68
9
  • Debug and see if your worker object is actually instantiated. Commented May 28, 2013 at 20:37
  • @neo Yes, i think it's instantiated. Commented May 28, 2013 at 21:36
  • 5
    It's probably broken in a way that Chrome knows how to handle, but that iTextSharp doesn't. Show us the HTML. Running it through HTMLAgilityPack may help. Commented May 29, 2013 at 15:02
  • 1
    By necessity (unfortunately) browsers have to cope with all sorts of broken html. A lot of tools that process (X)HTML are not always so lenient. There are tools if I'm not mistaken that will "fix" this (X)HTML by applying some of the same rules that browsers use and adding or modifying the HTML to be "correct" according to how a browser would interpret it, but I'm unable to come up with specific names of those tools. Commented Jul 9, 2015 at 14:03
  • 2
    The question is widely applicable to a large audience - the HTMLWorker this questions focuses on has been deprecated a long time ago due to numerous issues. Thus, that large audience had better switch to using the replacement class XmlWorker. Commented Jul 9, 2015 at 20:17

2 Answers 2

3
+25

I think the problem is a null reference exception being thrown due to HTML tags that the parser was unable to handle. try to remove the tags although HTMLWorker is no longer supported. It's discontinued in favor of XML Worker

Sign up to request clarification or add additional context in comments.

Comments

1

It looks like a null reference. Try to use the sintaxis "using" with all the IDisposable items:

using (HTMLWorker worker = new HTMLWorker(document))
                { (......) }

1 Comment

its not IDisposable

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.