3

I've been searching a lot on this the past couple days but still haven't found a clear way to do this... I know its simple to parse HTML with Perl to retrieve the text between tags, but I need to actually retrieve the text inside of a tag instead, such as this:

<input type="hidden" name="next_webapp_page" value=""/>

Here, I would want to extract the entire tag (or possibly the tag excluding the word "input"... I don't want to use Regex, I prefer to use a parser, any advice is appreciated.

1 Answer 1

4

Using HTML::TokeParser::Simple, look for input tags and print using the as_is method. Example:

#!/usr/bin/perl

use strict; use warnings;

use HTML::TokeParser::Simple;

my $parser = HTML::TokeParser::Simple->new(
    string => '<input type="hidden" name="next_webapp_page" value=""/>'
);

while ( my $tag = $parser->get_tag('input') ) {
    print $tag->as_is, "\n";
    for my $attr ( qw( type name value ) ) {
        printf qq{%s="%s"\n}, $attr, $tag->get_attr($attr);
    }
}

Output:

<input type="hidden" name="next_webapp_page" value=""/>

type="hidden"
name="next_webapp_page"
value=""

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks... I had been looking over the documentation for the tokeparser but I guess I missed this

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.