0

I have a subroutine that is called through another script to read the HTML file. Below is the code.

sub read_html
{
    $data=`cat "$_[0]"`;
    use HTML::TableExtract;
    print "CALLING read_html to read $_[0]\n";
    #my $self = shift;
    print "$_[1]";
    $te = HTML::TableExtract->new( headers => [($_[1])] );
    $te->parse($data);
    my $line_cnt=0;
    # Examine all matching tables
    foreach $ts ($te->tables)
    {
        if ($ts->rows ne "")
        {
            foreach $row ($ts->rows)
            {
                foreach (@$row) { $_='' unless defined $_; }
                print @$row;
                if (@$row[0] ne ' '  and @$row[0] ne ''  and
                    @$row[0] ne "\n" and @$row[0] ne "\t")
                {
                    $line_cnt++;
                }
            }
        }
        return $line_cnt;
    }
}

When I run the above script, it doesn’t show me the HTML table data when the header is passed as the variable.

$te = HTML::TableExtract->new( headers => [($_[1])] );

However if I replace the expression $_[1] with the hard coded values like below, it returns all the column values under the specified headers

$te = HTML::TableExtract->new(
    headers => [("PO Number",
                 "Invoice Number",
                 "DC Number",
                 "Store Number",
                 "Invoice Amount",
                 "Discount",
                 "Amount Paid")] );

I am calling the subroutine as read_html($file, $headers) where $file is a file name and $headers holds the header values, comma separated.

Any help would be greatly appreciated.

1 Answer 1

1

I am calling the subroutine as read_html($file, $headers) where $file is a file name and $headers as the header values comma separated.

The headers parameter of HTML::TableExtract->new expects a reference to an array of strings, where each string is a separate header. It sounds like you are instead passing it a reference to an array containing a single string containing comma characters.

my @headers = split m(\s*,\s*), $_[1];
$te = HTML::TableExtract->new( headers => \@headers );

If this is not correct, then your question needs to be more specific with regards to how you are calling read_html.

Sign up to request clarification or add additional context in comments.

4 Comments

thanks, you are correct I was calling it with a single string containing comma characters.
thanks, you are correct I was calling it with a single string containing comma characters. however I tried calling it after converting it to array as you suggested, but the results is same. It is not returning any thing when passed as an array reference.
I did figure out the problem, while calling the sub routine I am passing the values in double quotes "", so while I was converting the same to array it was going with these quotes which was not matched with the headers of the file. I removed the "" and it worked perfectly. Thanks
@YordanGeorgiev sure, start a bounty and then award it after the 24h grace period LOL ;) glad to have helped

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.