I am trying to scrape and extract multiple pages of data holding tables showing the value and quantity (volume) of various goods imported into China every month. Ultimately I want to write this data out to a text file for further processing. Here is a screenshot of such a page.

Specifically I would like to extract the name of the good being imported, the unit of volume (e.g. tons, kg), the actual value and the volume, a total of 4 fields. The problem I have run into is that the tables I want to extract seem to be at different depths.
I can extract the fields for 'volume' and 'value' because they are at the same depth, so I get debug output like this:
2,314 --- 2,943
20,082 --- 80,176
7 --- 62,426
The 'name' and 'units' fields are at different levels (I think) to the 'volume' and 'value' fields so they are not picked up when I use headers for all 4 fields. However, if I try to extract them as a sub-table it works fine, giving this debug output:
啤酒 --- 千升
葡萄酒 --- 千升
饲料用鱼粉 --- 万吨
How should I resolve this? My first thought is to extract each table separately, loop through each row of each table, add the 2 fields in one table and the 2 fields in the other to an array that would have 4 elements for each row. (In R I guess I would create a data frame and use cbind for this.) This seems feasible but it doesn't feel optimal. So first I want to ask:
1) Is there a straightforward way of telling HTML::TableExtract to extract both subsets of the table and combine them?
2) If I have to extract the data as two separate tables and combine them, what is the most efficient way of doing this?
The code I have so far follows:
use strict;
use HTML::TableExtract;
use Encode;
use utf8;
use WWW::Mechanize;
use Data::Dumper;
binmode STDOUT, ":utf8";
# Chinese equivalents of the various headings
my $txt_header = "单位:千美元";
my $txt_name = "商品名称";
my $txt_units = "计量单位";
my $txt_volume = "数量";
my $txt_value = "金额";
# Chinese Customs site
my $url = "http://www.chinacustomsstat.com/aspx/1/newdata/record_class.aspx?page=2&guid=951";
my $mech = WWW::Mechanize->new( agent => 'Mozilla/5.0 (Windows; U; MSIE 9.0; Windows NT 9.0; en-US)');
my $page = $mech->get( $url );
my $htmlstuff = $mech->content();
print ("\nFirst table with two headers (volume and value) at same depth\n\n");
my $te = new HTML::TableExtract( depth => 1, headers => [ ( $txt_volume, $txt_value ) ]);
$te->parse($htmlstuff);
# See what we have
foreach my $ts ( $te->tables ) {
print "Table (", join( ',', $ts->coords ), "):\n";
foreach my $row ( $ts->rows ) {
print join( ' --- ', @$row ), "\n";
}
}
print ("\nSecond table with 'name' and 'units'\n");
$te = new HTML::TableExtract( headers => [ ( $txt_name, $txt_units ) ]);
$te->parse($htmlstuff);
# See what we have in the other table
foreach my $ts ( $te->tables ) {
print "Table (", join( ',', $ts->coords ), "):\n";
foreach my $row ( $ts->rows ) {
print join( ' --- ', @$row ), "\n";
}
}