How can I use Perl to extract a particular part of an HTML file

Question

I am new to Perl, I am trying to read specific content between <div class="one"> of a HTML file.

HTML file:

<div class="one">

    <div id="two">Donec eu libero sit amet quam egestas semper. Aenean ultricies mi vitae est. Mauris placerat eleifend leo.
    </div>

    <pre>Pellentesque habitant morbi tristique senectus et netus et malesuada fames ac turpis egestas.
    </pre>

</div>

Perl Code:

my $file = "content.html";

if (-e $file) {
    open(IN, $file);
    while (<IN>) {
        chomp($line = $_);

        #print "$line\n";
    }
}

@contents = <IN>;

#check to if content in html file is in the right location,
#if content is in correct location (div class="one")
#print content in div two and three if exist

for (my $i = 0 ; $i <= $#contents ; $i++) {
    if (!$contents[$i] =~ m/^\s*<div/ && $contents[$i] =~ m/class\s*=\s*"one"/) {
        print "content in wrong location";
    }
    else {
        if ($contents[$i] =~ m/^\s*<div/) {
            print "$_";
        }
        else ($contents[$i] =~ m/^\s*<pre/) {
            print "$_";
        }
    }
}

That's not a "txt" file, it's an HTML file, and should be handled with an HTML parser. Down the "parse HTML with regex" road lies madness. — DavidO
– DavidO, Commented Apr 22, 2013 at 17:11
+1 on using a parser: search.cpan.org/dist/HTML-Parser/Parser.pm — SEngstrom
– SEngstrom, Commented Apr 22, 2013 at 17:13
@DavidO: It is a text file that happens to contain HTML. It has a MIME type of text/html. — Borodin
– Borodin, Commented Apr 22, 2013 at 17:16

mzedeler · Accepted Answer · 2013-04-22 17:57:07Z

1

I had some success using HTML::TreeBuilder which is good at handling broken HTML.

answered Apr 22, 2013 at 17:57

mzedeler

4,3994 gold badges30 silver badges45 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

How can I use Perl to extract a particular part of an HTML file

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related