Parsing / Extracting the inside of an HTML Tag using Perl?

Question

I've been searching a lot on this the past couple days but still haven't found a clear way to do this... I know its simple to parse HTML with Perl to retrieve the text between tags, but I need to actually retrieve the text inside of a tag instead, such as this:

<input type="hidden" name="next_webapp_page" value=""/>

Here, I would want to extract the entire tag (or possibly the tag excluding the word "input"... I don't want to use Regex, I prefer to use a parser, any advice is appreciated.

Sinan Ünür · Accepted Answer · 2010-07-08 18:52:36Z

4

Using HTML::TokeParser::Simple, look for input tags and print using the as_is method. Example:

#!/usr/bin/perl

use strict; use warnings;

use HTML::TokeParser::Simple;

my $parser = HTML::TokeParser::Simple->new(
    string => '<input type="hidden" name="next_webapp_page" value=""/>'
);

while ( my $tag = $parser->get_tag('input') ) {
    print $tag->as_is, "\n";
    for my $attr ( qw( type name value ) ) {
        printf qq{%s="%s"\n}, $attr, $tag->get_attr($attr);
    }
}

Output:

<input type="hidden" name="next_webapp_page" value=""/>

type="hidden"
name="next_webapp_page"
value=""

answered Jul 8, 2010 at 18:52

Sinan Ünür

118k15 gold badges201 silver badges347 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Rick Over a year ago

Thanks... I had been looking over the documentation for the tokeparser but I guess I missed this

Collectives™ on Stack Overflow

Parsing / Extracting the inside of an HTML Tag using Perl?

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related