ITextSharp code HTML parser not parsing the javascript.

Question

I m using the Itextsharp v5.4.2 with mvc4 web app, when trying to add the view returned on the page, with few javascripts loaded, it is failing to parse the html string in the html parser of the itextsharp.

Kindly help me to know like is there any alternate way can parse the webpage to be converted to pdf using itextsharp. Correct me if i m using the wrong approach.

<script type="type/javascript">

$(document).ready(function(){});

</script> 

<html><table>adsfasdf..</table> some table elements.........</html>

C#code:

PdfWriter writer= PdfWriter.GetInstance(doc, new FileStream(pdfpath + "/abcdtest.pdf", FileMode.Create));

            doc.Open();
var parsedHtmlElement = HTMLWorker.ParseToList(new StringReader(decodedHtmlElement), null);

If you are using Html to pdf so <script> tag not working.please don't use javascript in html to Pdf. — Manish Sharma
– Manish Sharma, Commented Jul 22, 2013 at 10:26
so, is there no other way to parse that page then? Please let me know to ignore the script tags used in the html string to be passed for the htmlstring in pdfconversion — Karthika Subramanian
– Karthika Subramanian, Commented Jul 22, 2013 at 10:28
that means you want only html tag output in your Pdf.am I am right? — Manish Sharma
– Manish Sharma, Commented Jul 22, 2013 at 10:37
Yes.. that can be in C# also so that can filter the html codes from the page i get and parse it.. Please let me know, it wil be helpful.. thanks in Advance.. :) — Karthika Subramanian
– Karthika Subramanian, Commented Jul 22, 2013 at 10:48

Manish Sharma · Accepted Answer · 2013-07-22 10:54:14Z

3

Use This Function Pass your Html string in HTMLCode and file Save Path in filePath.

 public void converttopdf(string HTMLCode, string filePath)
 {
        Document document = new Document();

        try
        {

            HTMLCode = Regex.Replace(HTMLCode, @"(<script[^*]*</script>)", "", RegexOptions.IgnoreCase);

            PdfWriter.GetInstance(document, new FileStream(filePath, FileMode.Create));
            document.Open();

            List<IElement> htmlarraylist = HTMLWorker.ParseToList(new StringReader(HTMLCode), null);
            for (int k = 0; k < htmlarraylist.Count; k++)
            {
                document.Add((IElement)htmlarraylist[k]);
            }

            document.Close();
        }
        catch
        {
        }
 }

answered Jul 22, 2013 at 10:54

Manish Sharma

2,4562 gold badges18 silver badges33 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Karthika Subramanian Over a year ago

got one more idea and posted below.. :)

Karthika Subramanian · Accepted Answer · 2013-07-23 07:19:06Z

1

One more way also it can be resolved, like, in the javascript code we can take the html alone, instead of passing to the C# and replacing the script tags.

like this,

function IgnoreScripts(htmlString)
{
 var div = document.createElement('div');
        div.innerHTML = htmlString;
        var scripts = div.getElementsByTagName('script');
        var i = scripts.length;
        while (i--) {
            scripts[i].parentNode.removeChild(scripts[i]);
        }
        return div.innerHTML;
}

answered Jul 23, 2013 at 7:19

Karthika Subramanian

3461 gold badge2 silver badges16 bronze badges

Collectives™ on Stack Overflow

ITextSharp code HTML parser not parsing the javascript.

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related