0

Is there a CPAN module or code snippet that I can use to modify local HTML files without using a regExp?

What I want to do :

  1. Change the start tag ( example : <div> to <div id="newtag"> )
  2. Add a tag before another ( example : </head> to <script type="text/javascript"> ...</script></head>
  3. Remove tags
  4. Read the content of a given tag. (<- ok this can be done with an XML / HTML parser.
1
  • 4
    There are quite a bit of answers to this on StackOverflow already, many with detailed examples. Always search first. :) Also, if you wonder if there is a CPAN module, go to CPAN and look. :) Commented Oct 17, 2010 at 20:10

2 Answers 2

6

If you have HTML, and not XHTML, then you don't want to be using an XML parser.

HTML::Parser is the standard HTML parser for Perl. Pretty much everything else is built on top of it.

HTML::TokeParser is an alternative interface to HTML::Parser. It returns things on demand instead of passing everything to callbacks.

HTML::TreeBuilder builds a DOM-like tree from the HTML, which you can then modify.

HTML::TreeBuilder::XPath extends HTML::TreeBuilder with XPath support.

HTML::Query extends HTML::TreeBuilder with jQuery-like selectors.

pQuery is another module that brings more complete jQuery compatibility to HTML::TreeBuilder.

Sign up to request clarification or add additional context in comments.

Comments

1

CPAN

A simple CPAN search returns

XPATH

It sounds like you are not familiar with XPath. Here is a quick tutorial to get you familiar. Its not Perl but it will explain the concepts.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.