1

I am trying to get first 3 tags texts using the PHP Simple HTML DOM Parser and collecting those in array.

The table is like:

<table>
    <tbody>
        <tr>
            <td>Floyd</td>
            <td>Machine</td>
            <td>Banking</td>
            <td>HelpScout</td>
        </tr>
        <tr>
            <td>Nirvana</td>
            <td>Paper</td>
            <td>Business</td>
            <td>GuitarTuna</td>
        </tr>
        <tr>
            <td>The edge</td>
            <td>Tree</td>
            <td>Hospital</td>
            <td>Sician</td>
        </tr>

        .....
        .....
    </tbody>
</table>

What I am trying to achieve is collect these in arrays excluding the 4th td of the tr tag:

array(
   array(
      'art' => 'Floyd',
      'thing' => 'machine',
      'passion' => 'Banking',
   ),
   array(
      'art' => 'Nirvana',
      'thing' => 'Paper',
      'passion' => 'Business',
   ),
   array(
      'art' => 'The edge',
      'thing' => 'Tree',
      'passion' => 'Hospital',
   ),
);

This is what I have tried is:

require_once dirname( __FILE__ ) . '/library/simple_html_dom.php';

$html    = file_get_html( 'https://www.example.com/list.html' );
$collect = array();
$list    = $html->find( 'table tbody tr td' );

foreach( $list as $l ) {
    $collect[] = $l->plaintext;
}

$html->clear();
unset($html);

print_r($collect);

Which is giving all the tds in array and it's being difficult to identify the array keys which I require. Is there any solution for me?

1
  • Maybe iterate over the tr so you can grab the first 3 td of each. Commented Sep 20, 2019 at 14:11

1 Answer 1

3

Instead of iterating over all td elements at once, you can iterate over each tr and for each tr, iterate over inner td elements and skip the 4th td:

$htmlString =<<<html
<table>
    <tbody>
        <tr>
            <td>Floyd</td>
            <td>Machine</td>
            <td>Banking</td>
            <td>HelpScout</td>
        </tr>
        <tr>
            <td>Nirvana</td>
            <td>Paper</td>
            <td>Business</td>
            <td>GuitarTuna</td>
        </tr>
        <tr>
            <td>The edge</td>
            <td>Tree</td>
            <td>Hospital</td>
            <td>Sician</td>
        </tr>
    </tbody>
</table>
html;
$html = str_get_html($htmlString);

// find all tr tags
$trs = $html->find('table tr');
$collect = [];

// foreach tr tag, find its td children
foreach ($trs as $tr) {
    $tds = $tr->find('td');
    // collect first 3 children and skip the 4th
    $collect []= [
        'art' => $tds[0]->plaintext,
        'thing' => $tds[1]->plaintext,
        'passion' => $tds[2]->plaintext,
    ];
}
print_r($collect); 

the output is:

Array
(
    [0] => Array
        (
            [art] => Floyd
            [thing] => Machine
            [passion] => Banking
        )

    [1] => Array
        (
            [art] => Nirvana
            [thing] => Paper
            [passion] => Business
        )

    [2] => Array
        (
            [art] => The edge
            [thing] => Tree
            [passion] => Hospital
        )

)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.