1

i have this HTML code from my company site. Since I do not have access to the database, I want to parse thru a HTML file and return the values. The code is like this:

<?php
$string = '
<p> <b>HEADER INFO</b>
<table width=100% cellspacing=0>
  <tr align=left>
    <td width=2% colspan=1><font face=\"verdana, arial, helvetica\" size=\"-2\">&nbsp;&nbsp;</font></td>
    <td valign=top colspan=1><font face=\"verdana, arial, helvetica\" size=\"-2\"><b>View Object:</b> 6600422</font></td>
    <td valign=top colspan=1><font face=\"verdana, arial, helvetica\" size=\"-2\"><b>BPO:</b> G37147359-000000</font></td>
    <td valign=top colspan=1><font face=\"verdana, arial, helvetica\" size=\"-2\"><b>Ack Date:</b> 2012-05-28</font></td>
  </tr>
  <tr align=left>
    <td width=2% colspan=1><font face=\"verdana, arial, helvetica\" size=\"-2\">&nbsp;&nbsp;</font></td>
    <td valign=top colspan=3><font face=\"verdana, arial, helvetica\" size=\"-2\"><b>Operation(s):</b> PPS_Queue, PPS_Build, PPS_BoxAll, JPN_End</font></td>
  </tr>
</table>
</p>
<hr>
<p> <b>EXTERNAL ORDER NUMBER REFERENCE</b>
<table width=100% cellspacing=0>
  <tr align=left>
    <td width=2% colspan=1><font face=\"verdana, arial, helvetica\" size=\"-2\">&nbsp;&nbsp;</font></td>
    <td valign=top colspan=1><font face=\"verdana, arial, helvetica\" size=\"-2\"><b>SAP Sales Order Number</b></font></td>
    <td valign=top colspan=1><font face=\"verdana, arial, helvetica\" size=\"-2\"><b>Customer P.O. Number</b></font></td>
    <td valign=top colspan=1><font face=\"verdana, arial, helvetica\" size=\"-2\"><b>Legacy Order Number</b></font></td>
  </tr>
  <tr align=left>
    <td width=2% colspan=1><font face=\"verdana, arial, helvetica\" size=\"-2\">&nbsp;&nbsp;</font></td>
    <td valign=top colspan=1><font face=\"verdana, arial, helvetica\" size=\"-2\">0310363858</font></td>
    <td valign=top colspan=1><font face=\"verdana, arial, helvetica\" size=\"-2\">77340892008-120413</font></td>
    <td valign=top colspan=1><font face=\"verdana, arial, helvetica\" size=\"-2\">89FF09378001</font></td>
  </tr>
</table>
</p>
<hr>
<p> <b>PRODUCTS FOR THIS WORK OBJECT/OPERATION(S)</b>
<table width=100% cellspacing=0>
  <tr align=left>
    <td width=2% colspan=1><font face=\"verdana, arial, helvetica\" size=\"-2\">&nbsp;&nbsp;</font></td>
    <td valign=top colspan=1><font face=\"verdana, arial, helvetica\" size=\"-2\"><b>PL</b></font></td>
    <td valign=top colspan=1><font face=\"verdana, arial, helvetica\" size=\"-2\"><b>Product #</b></font></td>
    <td valign=top colspan=1><font face=\"verdana, arial, helvetica\" size=\"-2\"><b>Qty</b></font></td>
    <td valign=top colspan=1><font face=\"verdana, arial, helvetica\" size=\"-2\"><b>Options</b></font></td>
    <td valign=top colspan=1><font face=\"verdana, arial, helvetica\" size=\"-2\"><b>Serial #</b></font></td>
  </tr>
  <tr align=left>
    <td width=2% colspan=1><font face=\"verdana, arial, helvetica\" size=\"-2\">&nbsp;&nbsp;</font></td>
    <td valign=top colspan=1><font face=\"verdana, arial, helvetica\" size=\"-2\">3C</font></td>
    <td valign=top colspan=1><font face=\"verdana, arial, helvetica\" size=\"-2\">AP703B</font></td>
    <td valign=top colspan=1><font face=\"verdana, arial, helvetica\" size=\"-2\">1</font></td>
    <td valign=top colspan=1>&nbsp </td>
    <td valign=top colspan=1><font face=\"verdana, arial, helvetica\" size=\"-2\">2S6219000G</font></td>
  </tr>
</table>
</p>
<hr>
<p> <b>Station Info</b>
<table width=100% cellspacing=0>
  <tr align=left>
    <td width=2% colspan=1><font face=\"verdana, arial, helvetica\" size=\"-2\">&nbsp;&nbsp;</font></td>
    <td valign=top colspan=1><font face=\"verdana, arial, helvetica\" size=\"-2\"><b>Start Station:</b> JPN_End</font></td>
    <td valign=top colspan=1><font face=\"verdana, arial, helvetica\" size=\"-2\"><b>Location:</b> Done</font></td>
    <td valign=top colspan=1><font face=\"verdana, arial, helvetica\" size=\"-2\"><b>Station:</b> </font></td>
  </tr>
  <tr align=left>
    <td width=2% colspan=1><font face=\"verdana, arial, helvetica\" size=\"-2\">&nbsp;&nbsp;</font></td>
    <td valign=top colspan=1><font face=\"verdana, arial, helvetica\" size=\"-2\"><b>Birth Date/Time:</b> 2012-05-23 14:20:32 SGT</font></td>
    <td valign=top colspan=1><font face=\"verdana, arial, helvetica\" size=\"-2\"><b>Power Cord:</b></font></td>
    <td valign=top colspan=1><font face=\"verdana, arial, helvetica\" size=\"-2\"><b>Voltage:</b></font></td>
  </tr>
</table>
</p>
<hr>
<p> <b>MATERIAL LIST FOR THIS WORK OBJECT/OPERATION(S)</b>
<table width=100% cellspacing=0>
  <tr align=left>
    <td width=2% colspan=1><font face=\"verdana, arial, helvetica\" size=\"-2\">&nbsp;&nbsp;</font></td>
    <td valign=top colspan=1><font face=\"verdana, arial, helvetica\" size=\"-2\"><b>Part Number</b></font></td>
    <td valign=top colspan=1><font face=\"verdana, arial, helvetica\" size=\"-2\"><b>Qty</b></font></td>
    <td valign=top colspan=1><font face=\"verdana, arial, helvetica\" size=\"-2\"><b>Description</b></font></td>
    <td valign=top colspan=1><font face=\"verdana, arial, helvetica\" size=\"-2\"><b>BB Type</b></font></td>
    <td valign=top colspan=1><font face=\"verdana, arial, helvetica\" size=\"-2\"><b>Material Location</b></font></td>
    <td valign=top colspan=1><font face=\"verdana, arial, helvetica\" size=\"-2\"><b>Serial Number</b></font></td>
  </tr>
  <tr align=left>
    <td width=2% colspan=1><font face=\"verdana, arial, helvetica\" size=\"-2\">&nbsp;&nbsp;</font></td>
    <td valign=top colspan=1><font face=\"verdana, arial, helvetica\" size=\"-2\">AP703B@@</font></td>
    <td valign=top colspan=1><font face=\"verdana, arial, helvetica\" size=\"-2\">1</font></td>
    <td valign=top colspan=1><font face=\"verdana, arial, helvetica\" size=\"-2\">OEM Generic 1U SAS Enclosure</font></td>
    <td valign=top colspan=1><font face=\"verdana, arial, helvetica\" size=\"-2\">BOM</font></td>
    <td valign=top colspan=1><font face=\"verdana, arial, helvetica\" size=\"-2\">ASSY</font></td>
    <td valign=top colspan=1><font face=\"verdana, arial, helvetica\" size=\"-2\">2S6219000G</font></td>
  </tr>
</table>
</p>
 ';

 $result = parse_data($string);

extract($result);

echo $headertext.'<br />';
echo $sapSON.'<br />';
echo $custPON.'<br />';
echo $legacyON.'<br />';
echo $pl.'<br />';
echo $pn.'<br />';


function parse_data($string){
$string = str_replace('&nbsp;&nbsp;','',$string);

$xml = new DOMDocument();
@$xml->loadHTML($string);

$ret = array();

foreach($xml->getElementsByTagName('p') as $p) {
    $header = trim($p->nodeValue);
}

foreach($xml->getElementsByTagName('td') as $td) {
    $value = trim($td->nodeValue);
    if(!empty($value) && is_numeric($value{0})){
        $ret[] = $value;
    }
}

$ret = array('headertext'=>$header,
             'sapSON'=>$ret[0],
             'custPON'=>$ret[1],
             'legacyON'=>$ret[2],
             'pl'=>$ret[3],
             'pn'=>$ret[4],);

return $ret;
}
?>

Now I want to save the header "External Order Number Reference into i variable which I can call later on.

Also, the second, third and fourth column of the first row correspond to the value of the second, third and fourth column of the second row respectively. I also want to save these values to variables. So basically, I need a PHP script which will parse this HTML file and return me the following:

$header1 = "HEADER INFO";
$viewObject = "6600422";
$BPO = "G37147359-000000";
$AckDate = "2012-05-28";
$Operations = "PPS_Queue, PPS_Build, PPS_BoxAll, JPN_End";
$header2 = "EXTERNAL ORDER NUMBER REFERENCE";
$sapSON = "0310363858";
$custPON = "77340892008-120413";
$legacyON = "89FF09378001";
$header3 = "PRODUCTS FOR THIS WORK OBJECT/OPERATION(S)"
$pl = "3C";
$pn = "AP703B";
$qty = "1";
$options = "&nbsp;";
$serialNo = "2S6219000G";

ETC... Basically, I need all the table contents saved into variables because I will later save them to my database and create a report out of it and generate barcodes for some details

Thanks for the help!

FYI: I do not have access to the database so all I can do is parse thru this HTML file and save the values to variables which I can store to my database later on. Also, do note that the headers are constant, the only changing values are the numbers which are for different orders.

5
  • See How to parse and process HTML with PHP? Commented May 28, 2012 at 2:26
  • Your company site? you should tell them the font tag is deprecated. By using css can you count how many bytes your save in bandwidth. Commented May 28, 2012 at 2:32
  • @ChristianVarga, I don't know where to start because I'm just beginning PHP. Commented May 28, 2012 at 2:56
  • @bsdnoobz, thanks for the link. will read on... Commented May 28, 2012 at 2:57
  • @LawrenceCherone, I have used your code but I can't seem to make it work for this instance. Commented May 29, 2012 at 5:28

1 Answer 1

2

Here Try this, See it in action

<?php
$string = '<p> <b>EXTERNAL ORDER NUMBER REFERENCE</b>
    <table width=100% cellspacing=0>
      <tr align=left>
        <td width=2% colspan=1><font face="verdana, arial, helvetica" size="-2">&nbsp;&nbsp;</font></td>
        <td valign=top colspan=1><font face="verdana, arial, helvetica" size="-2"><b>SAP Sales Order Number</b></font></td>
        <td valign=top colspan=1><font face="verdana, arial, helvetica" size="-2"><b>Customer P.O. Number</b></font></td>
        <td valign=top colspan=1><font face="verdana, arial, helvetica" size="-2"><b>Legacy Order Number</b></font></td>
      </tr>
      <tr align=left>
        <td width=2% colspan=1><font face="verdana, arial, helvetica" size="-2">&nbsp;&nbsp;</font></td>
        <td valign=top colspan=1><font face="verdana, arial, helvetica" size="-2">0310363858</font></td>
        <td valign=top colspan=1><font face="verdana, arial, helvetica" size="-2">77340892008-120413</font></td>
        <td valign=top colspan=1><font face="verdana, arial, helvetica" size="-2">89FF09378001</font></td>
  </tr>
    </table>
</p>
';

$result = parse_data($string);

extract($result);

echo $headertext.'<br />';
echo $sapSON.'<br />';
echo $custPON.'<br />';
echo $legacyON.'<br />';


function parse_data($string){
    $string = str_replace('&nbsp;&nbsp;','',$string);

    $xml = new DOMDocument();
    @$xml->loadHTML($string);

    $ret = array();

    foreach($xml->getElementsByTagName('p') as $p) {
        $header = trim($p->nodeValue);
    }

    foreach($xml->getElementsByTagName('td') as $td) {
        $value = trim($td->nodeValue);
        if(!empty($value) && is_numeric($value{0})){
            $ret[] = $value;
        }
    }

    $ret = array('headertext'=>$header,
                 'sapSON'=>$ret[0],
                 'custPON'=>$ret[1],
                 'legacyON'=>$ret[2]);

    return $ret;
}
?>

Edit version 2 (Multiple rows):

As your table is different for each iteration it becomes quite complex, but I like a challenge. Here you go, hope it helps...

<?php
$result = parse_data($string);

//Create Variables From Values
foreach($result as $key=>$value){
    foreach($value as $key_b=>$value_b){
        $$key_b = $value_b;
    }
}
/* --New Available Variables--
    $header0 = HEADER INFO
    $ViewObject = 6600422
    $BPO = G37147359-000000
    $AckDate = 2012-05-28
    $Operations = PPS_Queue, PPS_Build, PPS_BoxAll, JPN_End
    $header1 = EXTERNAL ORDER NUMBER REFERENCE
    $SAPSalesOrderNumber = 0310363858
    $CustomerPONumber = 77340892008-120413
    $LegacyOrderNumber = 89FF09378001
    $header2 = PRODUCTS FOR THIS WORK OBJECT/OPERATION(S)
    $PL = 3C
    $Product = AP703B
    $Qty = 1
    $Options =  
    $Serial = 2S6219000G
    $header3 = Station Info
    $StartStation = JPN_End
    $Location = Done
    $Station = 
    $BirthDateTime = 2012-05-23 14
    $PowerCord = 
    $Voltage = 
    $header4 = MATERIAL LIST FOR THIS WORK OBJECT/OPERATION(S)
    $PartNumber = AP703B@@
    $Description = OEM Generic 1U SAS Enclosure
    $BBType = BOM
    $MaterialLocation = ASSY
    $SerialNumber = 2S6219000G
*/

function parse_data($string){
    $string = str_replace('&nbsp;&nbsp;','',$string);
    $parts = explode('<hr>',$string);

    $html = new DOMDocument();
    $ret = array();
    $entry=0;
    foreach($parts as $part){
        @$html->loadHTML($part);
        //Get Header
        foreach($html->getElementsByTagName('p') as $p) {
            $ret[$entry]['header'.$entry] = trim($p->nodeValue);
        }
        $i=0;
        foreach($html->getElementsByTagName('td') as $td){
            $value = trim($td->nodeValue);
            if(empty($value)){
                continue;
            }
            switch($entry){
                case 0:
                    $split = explode(':',$value);
                    $ret[$entry][preg_replace('/[^a-zA-Z]/s', '', $split[0])] = trim($split[1]);
                    break;
                case 1:
                    if(!is_numeric($value{0})){
                        $ret[$entry][$i] = trim($value);
                    }else{
                        $ret[$entry][preg_replace('/[^a-zA-Z]/s', '', $ret[$entry][$i-3])] = trim($value);
                        unset($ret[$entry][$i-3]);
                    }
                    break;
                case 2:
                    if($i<=4){
                        $ret[$entry][$i] = trim($value);
                    }else{
                        $ret[$entry][preg_replace('/[^a-zA-Z]/s', '', $ret[$entry][$i-5])] = trim($value);
                        unset($ret[$entry][$i-5]);
                    }
                    break;
                case 3:
                    $split = explode(':',$value);
                    $ret[$entry][preg_replace('/[^a-zA-Z]/s', '', $split[0])] = trim($split[1]);
                    break;
                case 4:
                    if($i<=5){
                        $ret[$entry][$i] = trim($value);
                    }else{
                        $ret[$entry][preg_replace('/[^a-zA-Z]/s', '', $ret[$entry][$i-6])] = trim($value);
                        unset($ret[$entry][$i-6]);
                    }
                    break;
            }
            $i++;
        }
        $entry++;
    }
    return $ret;
}
?>
Sign up to request clarification or add additional context in comments.

6 Comments

The only problem I see is if you have more data then you given in your example, like more then row of data or multiple p tags.
Which is the case on my side. This is a multiple tabled output. So i have to parse thru all the data. I'll see what I can work on. Thank you very much! With this, I can start parsing thru the html and maybe find another workaround with the few bumps ahead :D
You can assign each row to a sub array then when using extract the rows will be available like $sapSON[0] or $sapSON[1]
your code works great! However, it doesn't fetch all the data I need when I put in the whole html page i need to parse. Thanks for the help, if you kindly can help me troubleshoot some more... Please see my edit... Thank you!
Wow! Let me try this! Thanks!
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.