0

I have a HTML table with the below format: As you can see, the first Header 1 has one Row 1 associated with it. The second Header 2 has two rows - Row 2, Row 3 associated with it. Header 3 has Row 4, Row 5, Row 6 associated with it.

<table>
<thead>
    <tr>
        <th>Header 1</th>
    </tr>
</thead>
<tbody>
        <tr>
            <td>
                Row 1
            </td>
        </tr>
</tbody>
<thead>
    <tr>
        <th>Header 2</th>
    </tr>
</thead>
<tbody>
        <tr>
            <td>
                Row 2
            </td>
        </tr>
        <tr>
            <td>
                Row 3
            </td>
        </tr>

</tbody>
<thead>
    <tr>
        <th>Header 3</th>
    </tr>
</thead>
<tbody>
        <tr>
            <td>
                Row 4
            </td>
        </tr>
        <tr>
            <td>
                Row 5
            </td>
        </tr>
        <tr>
            <td>
                Row 6
            </td>
        </tr>
</tbody>

I want to use the PHP Simple HTML Dom parser to get the following data:

Header 1, Row 1
Header 2, Row 2, Row 3
Header 3, Row 4, Row 5, Row 6

When I use the parser to get the tags, all of them are stored in one array. All other tags are stored in another array when I do the foreach loop. How do I preserve the association of the headers with the rows when I am looping?

2
  • Any reason why you don't use the built-in DOMDocument interface? Commented Oct 11, 2017 at 19:41
  • Show your code please. Which foreach are you referring to? Commented Oct 11, 2017 at 19:45

2 Answers 2

2

You could use the standard DOMDocument interface to do this. If your HTML is stored in variable $html, then do:

$dom = new DOMDocument();
$dom->loadHTML($html);
foreach ($dom->getElementsByTagName('tr') as $row) {
    if ($row->parentNode->tagName === 'thead') $arr[] = [];
    $arr[count($arr)-1][] = trim($row->textContent);
}

After running the above, the variable $arr will have this content:

[
    ['Header 1', 'Row 1'],
    ['Header 2', 'Row 2', 'Row 3'],
    ['Header 3', 'Row 4', 'Row 5', 'Row 6']
]
Sign up to request clarification or add additional context in comments.

Comments

1

Without seeing your existing php code it is difficult to say exactly how to change what you have. But something like this would work for your use case:

//Assuming $html has been set to your html block
$heads = $html->find('thead');
$result = array();

foreach($heads as $head){
    $headerText = $head->find('th')[0]->innerText;
    $result[$headerText] = array();
    $rows = $head->next_sibling()->find('td');
    foreach($rows as $row){
        $result[$headerText][] = $row->innerText;
    }
}

//Output
foreach($result as $header => $rows){
    echo $header . ': ' . implode(',', $rows);
}

Some caveats, the above is a simple example of what you want to do. It is a fairly naive implementation. E.g. it assumes that a given thead will only ever have exactly 1 th.

Also, If echoing it is really all you want to do, it would be more efficient to echo directly in the parsing loop. I separated the output since I assume you want to do more than just print it out to the screen.

Note, it would be fairly simple to do something like this using the native dom parser, I am assuming you need to use simple html dom for some other reason.

1 Comment

Thanks, it worked perfectly. @trincot's solution worked as well.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.