How to convert tag names and values from XML into HTML using Perl

Question

Is there any way to convert a simple XML document into HTML using Perl that would give me a table of tag names and tag values?

The XML file output.xml is like this

<?xml version="1.0"?>

<doc>
    <GI-eSTB-MIB-NPH>
        <eSTBGeneralErrorCode.0>INTEGER: 0</eSTBGeneralErrorCode.0>
        <eSTBGeneralConnectedState.0>INTEGER: true(1)</eSTBGeneralConnectedState.0>
        <eSTBGeneralPlatformID.0>INTEGER: 2076</eSTBGeneralPlatformID.0>
        <eSTBGeneralFamilyID.0>INTEGER: 25</eSTBGeneralFamilyID.0>
        <eSTBGeneralModelID.0>INTEGER: 60436</eSTBGeneralModelID.0>
        <eSTBMoCAMACAddress.0>STRING: 0:0:0:0:0:0</eSTBMoCAMACAddress.0>
        <eSTBMoCANumberOfNodes.0>INTEGER: 0</eSTBMoCANumberOfNodes.0>
    </GI-eSTB-MIB-NPH>
</doc>

I am trying to create HTML which looks like this

1. eSTBGeneralPlatformID.0 - INTEGER: 2076
2. eSTBGeneralFamilyID.0 - INTEGER: 25
3.

I was trying to use code from the web but I am really having a hard time understanding how to generate the required format for HTML tags.

What I was trying was this

#!/usr/bin/perl

use strict;
use warnings;

use XML::Parser;
use XML::LibXML;

#Add TagNumberConversion.pl here

my $parser = XML::Parser->new();
$parser->setHandlers(
    Start => \&start,
    End   => \&end,
    Char  => \&char,
    Proc  => \&proc,
);

my $header = &getXHTMLHeader();
print $header;

$parser->parsefile( '20150630104826.xml' );

my $currentTag = "";

sub start() {

    my ( $parser, $name, %attr ) = @_;
    $currentTag = $name;

    if ( $currentTag eq 'doc' ) {
        print "<head><title>"
            . "Output of snmpwalk for cpeIP4"
            . "</title></head>";
        print "<body><h2>" . "Output of snmpwalk for cpeIP4" . "</h2>";
        print '<table summary="'
            . "Output of snmpwalk for cpeIP4"
            . '"><tr><th>Tag Name</th><th>Tag Value</th></tr>';
    }
    elsif ( $currentTag eq 'GI-eSTB-MIB-NPH' ) {
        print "<tr>";
    }
    elsif ( $currentTag =~ /^eSTB/ ) {
        print "<tr>";
    }
    else {
        print "<td>";
    }
}

sub end() {

    my ( $parser, $name, %attr ) = @_;
    $currentTag = $name;

    if ( $currentTag eq 'doc' ) {
        print "</table></body></html>";
    }
    elsif ( $currentTag eq 'GI-eSTB-MIB-NPH' ) {
        print "</tr>";
    }
    elsif ( $currentTag =~ /^eSTB/ ) {
        print "</tr>";
    }
    else {
        print "</td>";
    }
}

sub char() {
    my ( $parser, $data ) = @_;

    print $data;
}

sub proc() {
    my ( $parser, $target, $data ) = @_;

    if ( lc( $target ) eq 'perl' ) {
        $data = eval( $data );
        print $data;
    }
}

sub getXHTMLHeader() {

    my $header = '<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html
PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">';

    return $header;
}

This is code in progress, but I realize that this will be overkill for my requirement.

So I am trying to figure out if there is any quick way to do it using Perl.

Please give me some pointers if there is indeed any quick way.

You shouldn't use an ampersand & when you are calling Perl subroutines -- just my $header = getXHTMLHeader() is correct. And you shouldn't use prototypes when you are defining subroutines -- those () after the subroutine names make sure that no parameters are ever passed, which isn't what you want at all. Just sub start { ... } is correct. And you should reserve capital letters for global identifiers such as package names -- local identifiers should consist of lower-case letters, decimal digits and underscores — Borodin
– Borodin, Commented Jul 2, 2015 at 21:16
It would help a lot if you showed the actual HTML that you wanted as output — Borodin
– Borodin, Commented Jul 2, 2015 at 21:20

xxfelixxx · Accepted Answer · 2015-07-02 21:01:23Z

1

The quick and dirty way is to just use a regular expression. However it comes with the risk of missing some data and getting burned by edge cases. But since you asked for it...

#!/usr/bin/env perl

use strict;

open my $fh, 'filename.xml'
    or die "unable to open filename.xml : $!";
my $count = 1;
print "<head><title>'Output of snmpwalk for cpeIP4'</title></head>\n";
print "<body><h2>'Output of snmpwalk for cpeIP4'</h2>\n";
print "<table summary='Output of snmpwalk for cpeIP4'><tr><th>Tag Name</th><th>Tag Value</th></tr>\n";
while (my $line = <$fh>) {
    next unless $line =~ m|<eSTB|;
    # Store into into $tag and $value
    # the result of matching whitespace, followed by '<'
    # followed by anything (store into $tag)
    # followed by '>'
    # followed by anything (store into $value)
    # followed by '<'
    my ($tag, $value) = $line =~ m|\s+<(.+?)>(.+?)<|;
    print "<tr><td>" . $count++ . ". $tag</td><td>$value</td></tr>\n";
}
print "</table></body></html>\n";

Produces the following:

<head><title>'Output of snmpwalk for cpeIP4'</title></head>
<body><h2>'Output of snmpwalk for cpeIP4'</h2>
<table summary='Output of snmpwalk for cpeIP4'><tr><th>Tag Name</th><th>Tag Value</th></tr>
<tr><td>1. eSTBGeneralErrorCode.0</td><td>INTEGER: 0</td></tr>
<tr><td>2. eSTBGeneralConnectedState.0</td><td>INTEGER: true(1)</td></tr>
<tr><td>3. eSTBGeneralPlatformID.0</td><td>INTEGER: 2076</td></tr>
<tr><td>4. eSTBGeneralFamilyID.0</td><td>INTEGER: 25</td></tr>
<tr><td>5. eSTBGeneralModelID.0</td><td>INTEGER: 60436</td></tr>
<tr><td>6. eSTBMoCAMACAddress.0</td><td>STRING: 0:0:0:0:0:0</td></tr>
<tr><td>7. eSTBMoCANumberOfNodes.0</td><td>INTEGER: 0</td></tr>
</table></body></html>

edited Jul 2, 2015 at 21:01

answered Jul 1, 2015 at 22:45

xxfelixxx

6,6123 gold badges36 silver badges39 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

300 Over a year ago

Thanks a lot xxfelixxx. I used your quick and dirty way and it gets the job done perfectly. I am trying to understand the following line though, which I think is doing pattern matching and something more, like getting the specific part of $line into $tag and $value strings: my ($tag, $value) = $line =~ m|\s+<(.+?)>(.+)<|;

xxfelixxx Over a year ago

1. store into variables $tag and $value 2. the result of the pattern match of $line 3. matching spaces followed by a '<' followed by anything 4. capture the 'anything', it goes into $tag 5. matching '>' followed by anything 6. capture the 'anything' it goes into $value 7. followed by '<'

Dave Cross Over a year ago

Hmm... parsing XML with a regex. That never goes wrong :-/

300 Over a year ago

Dave, I tried to install Template module so I could use LibXML instead of using regex, but found that CPAN is not installed (or at least not working now) on the Linux box that I am using. But I'll get CPAN working and will use LibXML for parsing as it'll be a more reliable way and good learning for me as well.

Dave Cross Over a year ago

The absence of the Tempalte Toolkit won't stop you from using XML::LibXML for parsing the XML. You'll just need to use a different approach for creating your output.

Dave Cross · Accepted Answer · 2015-07-03 08:22:42Z

1

Firstly, I think you're using the wrong tool for this. I always find XML::LibXML far easier to use than XML::Parser. You load XML::LibXML, but you never make use of it.

Secondly, I think you'll find your live is easier if you think of this as two stages - one to extract the data and one to output the new data.

Here's the first stage, which stores the data you need in an array.

#!/usr/bin/perl

use strict;
use warnings;
use 5.010;

use XML::LibXML;
use Data::Dumper;

my $file = shift || die "Must give XML file\n";

my $parser = XML::LibXML->new();
my $doc = $parser->parse_file($file);

my @tags;

# Find the nodes using an XPath expression
foreach ($doc->findnodes('//GI-eSTB-MIB-NPH/*')) {
  push @tags, { name => $_->nodeName, content => $_->textContent };
}

# Just here to show the intermediate data structure
say Dumper \@tags;

You then need to use @tags to generate your output. For over fifteen years we've know that it's a terrible idea to include hard-coded HTML in amongst your Perl code, so I'd highly recommend looking at a templating system like the Template Toolkit.

I created a xml.tt file like this:

<html>
<head>
<title>Output of snmpwalk for cpeIP4</title>
</head>
<body><h2>Output of snmpwalk for cpeIP4</h2>
<table summary='Output of snmpwalk for cpeIP4'>
<tr>
<th>Tag Name</th><th>Tag Value</th><
/tr>
[% FOREACH tag IN tags -%]
<tr><td>[% loop.count %]. [% tag.name %]</td><td>[% tag.content %]</td></tr>
[% END -%]
</table>
</body>
</html>

And then the second half of my program looks like this:

use Template;

my $tt = Template->new;
$tt->process('xml.tt', { tags => \@tags });

I hope you agree that all looks a lot simpler than your approach.

edited Jul 3, 2015 at 8:22

answered Jul 2, 2015 at 10:54

Dave Cross

69.5k3 gold badges55 silver badges101 bronze badges

2 Comments

300 Over a year ago

Thank you Dave. I agree that the solution you provided is too neat and simple than what I was trying. I think I don't have supporting module for "Template" though. I am getting error as: Can't locate Template.pm in @INC (@INC contains: /usr/local/lib64/perl5 /usr/local/share/perl5 /usr/lib64/perl5/vendor_perl /usr/share/perl5/vendor_perl /usr/lib64/perl5 /usr/share/perl5 .) at Call_to_snmpwalk_V_8.pl line 14. BEGIN failed--compilation aborted at Call_to_snmpwalk_V_8.pl line 14. I am trying to find how I can correct this error. I have already included use 5.010 and I am on perl v5.10.1

Dave Cross Over a year ago

Sounds like you don't have the Template module installed. You just need to install it by whatever means you use to install CPAN modules.

Collectives™ on Stack Overflow

How to convert tag names and values from XML into HTML using Perl

2 Answers 2

5 Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

5 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related