2

I am trying to print all the HTML tables containing the string "kcat" for each xml file in a directory but I am having some trouble. Note that each file in the directory (named kcat_tables) has at least one HTML table with kcat in it. I am running this program on an ubuntu virtual machine. Here is my code:

#!/usr/bin/perl
use warnings;
use strict;
use File::Slurp;
use Path::Iterator::Rule;
use HTML::TableExtract;
use utf8::all;
my @papers_dir_path = qw(/home/bob/kinase/kcat_tables);

my $rule = Path::Iterator::Rule->new;
$rule->name("*.nxml");
$rule->skip_dirs(".");

my $xml;
my $it = $rule->iter(@papers_dir_path);

while ( my $file = $it->() ) {
    $xml = read_file($file);
    my $te = HTML::TableExtract->new();
    $te->parse($xml);
    foreach my $ts ( $te->tables ) {
        if ( $ts =~ /kcat/i ) {
            print "Table (", join( ',', $ts->coords ), "):\n";
            foreach my $row ( $ts->rows ) {
                print join( ',', @$row ), "\n";
            }
        }
    }
}

Any ideas on how I should fix this? Thanks in advance! Also, I am fairly new to the PERL language so a simple, comprehensible answer would be very much appreciated.

7
  • What is your exact problem? Do you get any errors? Or is output different of your expected result? Show input and output and desired outcome too, then is there much more hope to help you. Commented Feb 14, 2015 at 2:44
  • When I run my code I get the following error: Use of uninitialized value in join or string at ./table_parser.pl line 39. Also, when something is outputted by the program it is in a very raw form and I cant really discern the table. So in other words, how can I get rid of that error and make the output more similar to a table format? Commented Feb 14, 2015 at 2:54
  • which join is on line 39? Commented Feb 14, 2015 at 3:52
  • reflowed your script. But it's not 39 lines long. (Would recommend getting hold of perltidy. It makes formatting your code nicely much easier) Commented Feb 14, 2015 at 11:55
  • Can you also give an example of your source data? It makes it easier to grok. Commented Feb 14, 2015 at 12:12

1 Answer 1

0

You cannot apply a regex to an object, as you do in:

if ( $ts =~ /kcat/i ) {

I'd suggest, parsing the tables in 'tree' mode. For this, you'd have to install two additional perl modules: HTML::TreeBuilder and HTML::ElementTable. Enable it like this:

use HTML::TableExtract 'tree';

Here's the fixed while loop:

while ( my $file = $it->() ) {
  $xml = read_file($file);
  my $te = HTML::TableExtract->new();
  $te->parse($xml);
  foreach my $ts ( $te->tables ) {
    my $tree = $ts->tree or die $!;
    if ( $tree->as_text =~ /kcat/i ) {
      print "Table (", join( ',', $ts->coords ), "):\n";
      # update 18.2.2015: pretty print the table
      foreach my $row ($ts->rows) {
        print join ' | ', map {sprintf "%22s", $_->as_text} @{$row};
        print "\n";
        # which is the same as
        # foreach my $cell (@${$row}) { do something with $cell->as_text }
      }
    }
  }
}

$tree is an HTML::ElementTable object. The code above works with your sample.

Sign up to request clarification or add additional context in comments.

4 Comments

I have imported the following to my program: use HTML::TreeBuilder; use HTML::ElementTable; use HTML::TableExtract 'tree'; My while loop is the same as yours except I have added: my $tree = HTML::ElementTable->new();. If I dont include this previous line then the program gives me the following error: Global symbol "$tree" requires explicit package name. If I do include this line then I got the following error: Can't locate object method "ElementTable=HASH(0x1aebc70)" via package "HTML" (it is talking about this line $tree = $ts->$tree or die $!;. What is wrong?
Sorry. It actually did work! Thank you! Do you know of any way I can print it out in a nicer format (like in table format)? Currently, it is just a block of text.
I updated the sample code in my answer above, it prints it out as a table on the console.
I made the change but got the following error: Can't call method "as_text" on unblessed reference at ./table_parser.pl line 34. Line 34 is referring to print join ' | ', map {sprintf "%22s", $_->as_text} @{$row}; Any ideas? Thanks for all your help!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.