Get multiple value from html with dom (without id or classes)

Question

I'm trying to get proxy and port value from this http://jsbin.com/noxuqusoga/edit?html, output html page.

Here is a sample of the table structure from that page, including only one tr, but the actual HTML has many tr elements with similar structure:

<table class="table" id="tbl_proxy_list" width="950">
    <tbody>
        <tr data-proxy-id="1355950">
            <td align="left"><abbr title="103.227.175.125">103.227.175.125 </abbr></td>
            <td align="left"><a href="/proxy-server-list/port-8080/" title="Port 8080 proxies">8080</a></td>
            <td align="left"><time class="icon icon-check timeago" datetime="2018-08-18 04:56:47Z">9 min ago</time></td>
            <td align="left">
            <div class="progress-bar" data-value="22" title="1089">
            <div class="progress-bar-inner" style="width:22%; background-color: hsl(26.4,100%,50%);">&nbsp;</div>
            </div>
            <small>1089 ms</small></td>
            <td style="text-align:center !important;"><span style="color:#009900;">95%</span> <span> (94)</span></td>
            <td align="left"><img alt="sg" class="flag flag-sg" src="/assets/images/blank.gif" style="vertical-align: middle;" /> <a href="/proxy-server-list/country-sg/" title="Proxies from Singapore">Singapore <span class="proxy-city"> - Bukit Timah </span> </a></td>
            <td align="left"><span class="proxy_transparent" style="font-weight:bold; font-size:10px;">Transparent</span></td>
            <td><span>-</span></td>
        </tr>
  </tbody>
</table>

I'm able to scrap the proxy address but I have difficulties with the port as the <td> does not have an id or a class and as value some have hyperlinks, and others don't.

How can I make the result like --> ip:port for the whole scrap result.

Here's my code

$html = file_get_html('http://jsbin.com/noxuqusoga/');

// Find all images
foreach($html->find('abbr') as $element)
       echo $element->title . '<br>';

foreach($html->find('td a') as $element)
       echo $element->plaintext . '<br>';

Please help,
Thanks

Use DomNode->next_sibling to get next td and strip_tags to strip <a> tags. — Ali Sheikhpour
– Ali Sheikhpour, Commented Aug 18, 2018 at 5:47
That does not look like DOM but a PHP library called SimpleHTMLDOM — ThW
– ThW, Commented Aug 20, 2018 at 9:27

Nima · Accepted Answer · 2018-08-20 11:41:33Z

Instead of writing a selector for td elements (or elements inside them, like abbr or a) write a selector for their tr parent, then loop over these trs (rows) and for each row, get the children of that row which you need:

// Select all tr elements inside tbody
foreach ($html->find('tbody tr') as $row)
    // the second parameter (zero) indicates we only need the first element matching our selector

    // ip is in the first <abbr> element that is child of a td
    $ip = $row->find('td abbr', 0)->plaintext;
    // port is in the first <a> element that is child of a td
    $port = $row->find('td a', 0)->plaintext;
    print "$ip:$port\n";
}

As an alternative, you should know when selecting elements, besides using css selectors you also have the option to get elements by their index. In your case, what you want from each tr is in the first and the second td elements inside each tr element. So you can also find the first and the second child of each tr to extract the data.

Collectives™ on Stack Overflow

Get multiple value from html with dom (without id or classes)

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related