1

Is there any possibility to convert an HTML table to JSON with PHP?

I have this JavaScript:

    <script>
(function() {
    var jsonArr = [];
    var obj = {};
    var rowIx = 0;
    var jsonObj = {};
    var thNum = document.getElementsByTagName('th').length;
    var arrLength = document.getElementsByTagName('td').length;

    for(i = 0; i < arrLength; i++){
        if(i%thNum === 0){
            obj = {};
        }
        var head = document.getElementsByTagName('th')[i%thNum].innerHTML;
        var content = document.getElementsByTagName('td')[i].innerHTML;
        obj[head] = content;
        if(i%thNum === 0){
            jsonObj[++rowIx] = obj
        }   
    }           

    var result = "<br>"+JSON.stringify({"Values": jsonObj});
    document.write(result);
})();
</script>

which uses the below HTML code:

<TABLE border="3" rules="all" bgcolor="#E7E7E7" cellpadding="1" cellspacing="1">
<TR>
<TH align=center><font size="3" face="Arial">Date</font></TH>
<TH align=center><font size="3" face="Arial"><B>Teacher</B></font></TH>
<TH align=center><font size="3" face="Arial">?</font></TH>
<TH align=center><font size="3" face="Arial">Hour</font></TH>
<TH align=center><font size="3" face="Arial">Subject</font></TH>
<TH align=center><font size="3" face="Arial">Class</font></TH>
<TH align=center><font size="3" face="Arial">Room</font></TH>
<TH align=center><font size="3" face="Arial">(Teacher)</font></TH>
<TH align=center><font size="3" face="Arial">(Room)</font></TH>
<TH align=center><font size="3" face="Arial">XYY</font></TH>
<TH align=center><font size="3" face="Arial"><B>Information</B></font></TH>
<TH align=center><font size="3" face="Arial">(Le.) nach</font></TH>
</TR>
<TR><TD align=center><font size="3" face="Arial">24.9.</font></TD>
<TD align=center><font size="3" face="Arial"><B><strike>Dohe</strike></B></font></TD>
<TD align=center><font size="3" face="Arial">Free</font></TD>
<TD align=center><font size="3" face="Arial">1</font></TD>
<TD align=center><font size="3" face="Arial"><strike>Math</strike></font> </TD>
<TD align=center><font size="3" face="Arial">(9)</font> </TD>
<TD align=center><font size="3" face="Arial">---</font> </TD>
<TD align=center><font size="3" face="Arial"><strike>Dohe</strike></font></TD>
<TD align=center><font size="3" face="Arial">A001</font></TD>
<TD align=center>&nbsp;</TD>
<TD align=center>&nbsp;</TD>
<TD align=center><font size="3" face="Arial">Free.</font></TD>
</TR>
</TABLE>

to generate this JSON code:

{"Values":{"1":{"Date":"24.9.","Teacher":"Dohe","?":"Free","Hour":"1","Subject":"Math ","Class":"(9) ","Room":"--- ","(Teacher)":"Dohe","(Room)":"A001","XYY":" ","Information":" ","(Le.) nach":"Free."},"2":{"Date":"26.9.","Teacher":"John","?":"Free","Hour":"8","Subject":"Bio ","Class":"(9) ","Room":"--- ","(Teacher)":"John","(Room)":"A021","XYY":" ","Information":" ","(Le.) nach":"Free."}}}

The script is perfect but I need a script, which saves the JSON data to a file on the server automatically, without any user interaction.

7
  • Can post it to a PHP page via AJAX and just have the PHP page write it to a file. Commented Oct 6, 2015 at 20:55
  • and how ? Do you have a link or something ? Commented Oct 6, 2015 at 20:57
  • That depends, what have you tried? Want to use JQuery or raw JavaScript? When the PHP gets the data, do you want it to write it to a txt file or a json file. What do you want the name of the file to be? Are you sending an email after it's created? Storing a link in a database? Need more info about what you want to accomplish. Commented Oct 6, 2015 at 20:59
  • I want that a program is loading a html Page. This page should automatically put the data from the table to a JSON called subs.json. That all nothing else should happen. But the user shouldn't do any action to make this happen, only load the page. Commented Oct 6, 2015 at 21:04
  • How is the HTML page generated, if it's dynamic, why not have the same data pushed to the file at the same time? If it's static, you have to rely on JavaScript. If the browser does not support JS, is that okay? IF 2 people load the page at the same time, do you want two files generated or should the second overwrite the first? Amend to the first? Still need more info. Commented Oct 6, 2015 at 21:08

3 Answers 3

1
+50

If you say your JS logic is perfect, here is a PHP (ver 5.3+) conversion that uses DOM like your code.

This function loads a html file (you may use curl if it is an url) then convert it and save to a file.

function save_table_to_json ( $in_file, $out_file ) {
    $html = file_get_contents( $in_file );
    file_put_contents( $out_file, convert_table_to_json( $html ) );
}

function convert_table_to_json ( $html ) {
    $document = new DOMDocument();
    $document->loadHTML( $html );

    $obj = [];
    $jsonObj = [];
    $th = $document->getElementsByTagName('th');
    $td = $document->getElementsByTagName('td');
    $thNum = $th->length;
    $arrLength = $td->length;
    $rowIx = 0;

    for ( $i = 0 ; $i < $arrLength ; $i++){
        $head = $th->item( $i%$thNum )->textContent;
        $content = $td->item( $i )->textContent;
        $obj[ $head ] = $content;
        if( ($i+1) % $thNum === 0){ // New row. Slightly modified to keep it correct.
            $jsonObj[++$rowIx] = $obj;
            $obj = [];
        }
    }

    return json_encode([ "Values" => $jsonObj ]);
}

// Example
save_table_to_json( 'table.html', 'data.json' );
Sign up to request clarification or add additional context in comments.

9 Comments

Thank you for the answer. This gives me this: {"Values":[{"Datum":"23.10.","Vertreter":"KUL","Art":"Vertretung","Stunde":"9","Fach":"O5","Klasse(n)":"5Z","Raum":"C111","(Lehrer)":"GAT","(Raum)":"C310","Vertr. von":"\u00a0","Vertretungs-Text":"\u00a0","(Le.) nach":"Klassenfahrt 10 A"},{"Datum":"2. So it puts the data in an array. But what I need is this: { "Values": { "1": { "Head":"data", "Head2":"Data2"},"2"{ also with the id, which should be generated automatically. Could you help me ?
@MariusSchönefeld Very observant. It is true, there are languages where it matters. I have re-added the rowIx variable which should get you the result you need :)
Hey, I have a Problem with the Script now it says in the console Unexpected token > at this line $document->loadHTML( $html );
@MariusSchönefeld Check the HTML that all > and & are properly escaped. Many people write non-conformance HTML (v4) and rely on browsers to interpret the actual intention. (And v5 got feed up with the inconsistencies and define these interpretations.)
Thanks for the anwser here is my HTML. I dont See a Problem. pastebin.com/CSg4Qi7T
|
0

Personally, I like using JQuery, yet if you don't know or can't insert it as a SRC, then we should assume to use JavaScript. So we will post the JSON data to our PHP via AJAX when the page is done loading. The PHP will then write this JSON into a new file on the server called subs.json and will be overwritten each time the script runs.

We will start with the JavaScript:

<script>
function collectData() {
    var jsonArr = [];
    var obj = {};
    var rowIx = 0;
    var jsonObj = {};
    var thNum = document.getElementsByTagName('th').length;
    var arrLength = document.getElementsByTagName('td').length;

    for(i = 0; i < arrLength; i++){
        if(i%thNum === 0){
            obj = {};
        }
        var head = document.getElementsByTagName('th')[i%thNum].innerHTML;
        var content = document.getElementsByTagName('td')[i].innerHTML;
        obj[head] = content;
        if(i%thNum === 0){
            jsonObj[++rowIx] = obj;
        }   
    }           
    return jsonObj;
}

function postJSONData(json){
    var xmlhttp = new XMLHttpRequest();
    xmlhttp.open("POST", "/subPost.php");
    xmlhttp.setRequestHeader("Content-Type", "application/application/x-www-form-urlencoded");
    xmlhttp.onreadystatechange = function() {
        if(xmlhttp.readyState == 4 && xmlhttp.status == 200) {
            var return_data = xmlhttp.responseText;
            alert(return_data);
        }
    }
    xmlhttp.send("values="+JSON.stringify(json));
}

postJSONData(collectData());
</script>

At this point, the page should post your JSON to a PHP page called subPost.php located at the same level as the page that is executing this JS. This PHP Will look like:

<?php
if(isset($_POST['values'])){
        $values = $_POST['values'];
        $fp = fopen('subs.json', 'w');
        fwrite($fp, $values);
        fclose($fp);
        echo "Values written to subs.json.\r\n";
} else {
        echo "No Post data received.";
}
?>

I made a working example you can see here: http://www.yrmailfrom.me/projects/testPost/ and the content of http://www.yrmailfrom.me/projects/testPost/subs.json is:

{"1":{"<font face=\"Arial\" size=\"3\">Date</font>":"<font face=\"Arial\" size=\"3\">24.9.</font>","<font face=\"Arial\" size=\"3\"><b>Teacher</b></font>":"<font face=\"Arial\" size=\"3\"><b><strike>Dohe</strike></b></font>","<font face=\"Arial\" size=\"3\">?</font>":"<font face=\"Arial\" size=\"3\">Free</font>","<font face=\"Arial\" size=\"3\">Hour</font>":"<font face=\"Arial\" size=\"3\">1</font>","<font face=\"Arial\" size=\"3\">Subject</font>":"<font face=\"Arial\" size=\"3\"><strike>Math</strike></font> ","<font face=\"Arial\" size=\"3\">Class</font>":"<font face=\"Arial\" size=\"3\">(9)</font> ","<font face=\"Arial\" size=\"3\">Room</font>":"<font face=\"Arial\" size=\"3\">---</font> ","<font face=\"Arial\" size=\"3\">(Teacher)</font>":"<font face=\"Arial\" size=\"3\"><strike>Dohe</strike></font>","<font face=\"Arial\" size=\"3\">(Room)</font>":"<font face=\"Arial\" size=\"3\">A001</font>","<font face=\"Arial\" size=\"3\">XYY</font>":"

This is not valid JSON. It seems that some data is being misunderstood. I suspect that this is due to characters in the values, like &nbsp;. I see this in the posted data:

"<font face=\"Arial\" size=\"3\">XYY</font>":"
    [nbsp;","<font_face] => \"Arial\" size=\"3\">(Le.) nach</font>":"<font face=\"Arial\" size=\"3\"
>Free.</font>"}}

I was able to overcome this with another small JS function:

function nbsp2space(str) {
    return String(str).replace(/&nbsp;/g, ' ');
}

Then use this function in collectData() like so:

obj[head] = nbsp2space(content);

Now when the page executes, we post the data to the PHP and it's written to the file subs.json.

10 Comments

thank you for the Code !!!!! I integrated it on my server but its still the incorrect JSON format.
What are you seeing in your subs.json? Also making use of FireBug in FireFox, what are you posting exactly?
In the sub.json it says {"1":{"<font size=\"3\" face=\"Arial\">Date</font>":"<font size=\"3\" face=\"Arial\">24.9.</font>",... and this is exactly what i am posting
The script takes everything what is between <TH align=center> and </TH>, which means also the <font size="3" face="Arial"> </font> or <strike> </strike> this must excluded but how ?
If i add the line document.write("<br>"+JSON.stringify({"Values": jsonObj})); to the script it prints the completely right result. It also gives the ID number 1 ,2 ,3 etc.. So the error must be in the AJAX part but i don't know AJAX
|
0

You can try some thing like this:

HTML

<table>
<tr>
<th>No</th>
<th>Name</th>
<th>Email</th>
</tr>
<tr>
<td>1</td>
<td>Test</td>
<td>[email protected]</td>
</tr>
<tr>
<td>2</td>
<td>Test 2</td>
<td>[email protected]</td>
</tr>
<tr>
<td>3</td>
<td>Test 3</td>
<td>[email protected]</td>
</tr>
</table>

Javascript

   <script type="text/javascript">
        jQuery(document).ready(function(){
          data = new Array();
          columns = [];
          var row = new Array();
          $('table tr').each(function(index,tr){
              var index = index;
              if(index == 0){ // First we get column names from th.

                $(tr).find('th').each(function(thIndex,thValue){
                  columns.push($(thValue).text());
                });
              } else {
                $(tr).find('td').each(function(tdIndex,tdValue){
                  row[tdIndex] = $(tdValue).text(); // Put each td value in row
                });

                data.push(row); // now push each row in data.
                row = new Array(); // reset row after push
              }

          });
        // Send it to PHP for further work:
          $.post('json.php', { data: data, columns: columns }, function(response){
          // TODO with response
          });
        })
        </script>

json.php

$data = $_POST['data']; // Each rows values
$columns = $_POST['columns']; // Columns names

for($i = 0; $i < count($data); $i++) {

  $json[] = array(($i+1) => array_combine($columns, $data[$i]));

}

$json = json_encode($json);
// TODO with $json eg: file_put_contents();

the output you will get after json_encode() is:

{"values":[{"1":{"No":"1","Name":"Test","Email":"[email protected]"}},{"2":{"No":"2","Name":"Test 2","Email":"[email protected]"}},{"3":{"No":"3","Name":"Test 3","Email":"[email protected]"}}]}

Note jQuery must be included before running this.

8 Comments

Thank you for the code. The output should be { "values" : { "1": {"Name":"Test","Email":"[email protected]"}, { "2" : { "Name":"Test 2","Email":"[email protected]"}
then you can simply do: $json = json_encode(array('values' => $json)); echo $json;
pastebin.com/1ufPXPKn and pastebin.com/DJvk9wiu this is my code but it doesn't works
any error in console ? i've tested it and it's working
yes ReferenceError: Can't find variable: jQuery (anonyme Funktion)test1.html:24
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.