Need help with creating PDF from HTML using itextsharp

Question

I'm trying to crate a PDF out of a HTML page. The CMS I'm using is EPiServer.

This is my code so far:

    protected void Button1_Click(object sender, EventArgs e)
    {
        naaflib.pdfDocument(CurrentPage);
    }


    public static void pdfDocument(PageData pd)
    {
        //Extract data from Page (pd).
        string intro = pd["MainIntro"].ToString(); // Attribute
        string mainBody = pd["MainBody"].ToString(); // Attribute

        // makae ready HttpContext
        HttpContext.Current.Response.Clear();
        HttpContext.Current.Response.ContentType = "application/pdf";

        // Create PDF document
        Document pdfDocument = new Document(PageSize.A4, 80, 50, 30, 65);
        //PdfWriter pw = PdfWriter.GetInstance(pdfDocument, HttpContext.Current.Response.OutputStream);
        PdfWriter.GetInstance(pdfDocument, HttpContext.Current.Response.OutputStream);  

        pdfDocument.Open();
        pdfDocument.Add(new Paragraph(pd.PageName));
        pdfDocument.Add(new Paragraph(intro));
        pdfDocument.Add(new Paragraph(mainBody));
        pdfDocument.Close();
        HttpContext.Current.Response.End();
    }

This outputs the content of the article name, intro-text and main body. But it does not pars HTML which is in the article text and there is no layout.

I've tried having a look at http://itextsharp.sourceforge.net/tutorial/index.html without becomming any wiser.

Any pointers to the right direction is greatly appreciated :)

Jay Riggs · Accepted Answer · 2010-04-09 05:11:10Z

5

For later versions of iTextSharp:

Using iTextSharp you can use the iTextSharp.text.html.simpleparser.HTMLWorker.ParseToList() method to create a PDF from HTML.

ParseToList() takes a TextReader (an abstract class) for its HTML source, which means you can use a StringReader or StreamReader (both of which use TextReader as a base type). I used a StringReader and was able to generate PDFs from simple mark up. I tried to use the HTML returned from a webpage and got errors on all but the simplist pages. Even the simplist webpage I retrieved (http://black.ea.com/) was rendering the content of the page's 'head' tag onto the PDF, so I think the HTMLWorker.ParseToList() method is picky about the formatting of the HTML it parses.

Anyway, if you want to try here's the test code I used:

// Download content from a very, very simple "Hello World" web page.
string download = new WebClient().DownloadString("http://black.ea.com/");

Document document = new Document(PageSize.A4, 80, 50, 30, 65);
try {
    using (FileStream fs = new FileStream("TestOutput.pdf", FileMode.Create)) {
        PdfWriter.GetInstance(document, fs);
        using (StringReader stringReader = new StringReader(download)) {
            ArrayList parsedList = HTMLWorker.ParseToList(stringReader, null);
            document.Open();
            foreach (object item in parsedList) {
                document.Add((IElement)item);
            }
            document.Close();
        }
    }

} catch (Exception exc) {
    Console.Error.WriteLine(exc.Message);
}

I couldn't find any documentation on which HTML constructs HTMLWorker.ParseToList() supports; if you do please post it here. I'm sure a lot of people would be interested.

For older versions of iTextSharp: You can use the iTextSharp.text.html.HtmlParser.Parse method to create a PDF based on html.

Here's a snippet demonstrating this:

Document document = new Document(PageSize.A4, 80, 50, 30, 65); 
try  {
   using (FileStream fs = new FileStream("TestOutput.pdf", FileMode.Create)) {
      PdfWriter.GetInstance(document, fs);
      HtmlParser.Parse(document, "YourHtmlDocument.html");
   }
} catch(Exception exc)  { 
   Console.Error.WriteLine(exc.Message); 
}

The one (major for me) problem is the HTML must be strictly XHTML compliant.

Good luck!

edited Apr 9, 2010 at 5:11

answered Apr 8, 2010 at 0:19

Jay Riggs

53.7k10 gold badges146 silver badges156 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Steven Over a year ago

Great. Thanks. Can YourHtmlDocument.html be the URL to the page?

Steven Over a year ago

PS. I don't think the newest version of iTextSharp has HtmlParser. The closest I get it iTextSharp.text.html.simpleparser.HTMLWorker But using that, requires a TextReader for the parsing.....

Jay Riggs Over a year ago

@Steven - You're right! Sorry about that, I loaded up an old test program (with an old version of iTextSharp) when I answered your question. You're right though, HTMLWorker is the way you'd want to do. I edited my response based on (limited) testing I did with HTMLWorker and iTextSharp 5.0.

Steven Over a year ago

Great job Jay! If I remember correctly, I think the web pages must be using XHTML strict, or be 100% correct XHTML. So maybe that's why it's so picky! I will give it a go. If it prooves not to be good enough, I will use ABCpdf.

Collectives™ on Stack Overflow

Need help with creating PDF from HTML using itextsharp

1 Answer 1

4 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related