3

I have a collection of one thousand HTML files and need to somewhat trim them. I need to delete all the tags inside <body></body> area of those except for one, <div.pg>, to make them clean to be printed. the excess are navigation links which make the prints messy and make the pages occupy more paper. the contents are not the same so I can't find and replace the code excerpt but the tags are the same foe example there are 3 <table> tags to be deleted each with specific class. manipulate specific tags inside batch HTML files?

Any batch processing technique or software to do this job? What an easy solution on windows?

2
  • If it's for print, why not simply add a @media print stylesheet to hide any page sections you DON'T want printed? Commented Sep 27, 2011 at 20:57
  • In fact I want to convert them into PDF before printing, does that help it? would Acrobat render HTML files as to be printed and then make the PDFs? Commented Sep 27, 2011 at 20:59

2 Answers 2

2

I would use an xslt transform on each html page you have. Batch is not the tool to manipulate html files. You can use batch as a "manager" to pass the required file to the xsl transform. Also windows have a rudimentary msxml utility which you can download and install to your machine : http://www.microsoft.com/download/en/details.aspx?displaylang=en&id=21714

That's how I would do it. I am sure there are more options.

Sign up to request clarification or add additional context in comments.

4 Comments

Thanks, but by Batch I meant processing a group of files at once.
Ah OK. Sorry my mistake. Do the images fit in one A4 page? Does each html page contain only one image?
Each HTML has one <div.pg> whith usually more than one image inside that, and I want all the <div.pg> which is directly inside the <body>.
I still would go with XSLT. I would select the <div.pg> directly under the body and create a different html with only the elements I wanted. Then it would be relatively easy to print. Also for transforming the pages to pdf I can suggest an open source tool which I have also used in the past : code.google.com/p/wkhtmltopdf
0

If it is XHTML you could use XSLT to transform your HTML to "another" format. Look for example here: http://www.w3schools.com/xsl/ or here: http://help.hannonhill.com/discussions/how-do-i/269-strip-specific-html-tag-in-xslt

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.