PHP DOM/xpath check element span class value

Question

Within a curl request I have a html table that has the below structure. I now want to extract only table rows that contain a span element with the empty class and not the ones with the class="subcomponent". I successfully tried Xpath to find the elements with the empty class but how to do I get the entire <tr> or even better specific <td> nodes that contain Version and Partnumber. Thanks in advance.

<table>
...
<tbody>
    <tr>
        <td></td>
        <td></td>
        <td>
            <span class="">Product</span>
        </td>
        <td>Version</td>
        <td>Partnumber</td>
    </tr>
    <tr>
        <td></td>
        <td></td>
        <td>
            <span class="subcomponent">Component</span>
        </td>
        <td>Version</td>
        <td>Partnumber</td>
    </tr>
</tbody>

My PHP code

$doc = new DOMdocument();
libxml_use_internal_errors(true);
$doc->loadHTML($page);
$doc->saveHTML();
$xpath = new DOMXpath($doc);
$query ='//span[@class=""]';
$entries = $xpath->query($query);

foreach ($entries as $entry) {
    echo $entry->C14N();
}

iainn · Accepted Answer · 2017-09-12 15:00:55Z

2

To access the table rows themselves using SimpleXML, you can use the following:

$sxml = simplexml_load_string('<table>...</table>');

$rows = $sxml->xpath('//tr[td/span[@class=""]]');

foreach ($rows as $row) {
  echo "Version: ", $row->td[3], ", Partnumber: ", $row->td[4];
}

The XPath works by selecting all <tr> tags that have a child <td>, which itself has a child <span> with a blank class.

In the loop, you need to access the child cells of each row by number, since your sample doesn't indicate that they're labelled any other way. I'm assuming a table structure won't change too often though, so that should be fine.

See https://eval.in/860169 for an example.

Alternative DOMDocument Version

If you're fetching a full webpage, which won't necessarily be well-formed, you might need to use DOMDocument as you have in your first example. It's a bit less clean to access the child-elements, but something like the following will work:

$doc = new DOMdocument;
libxml_use_internal_errors(true);
$doc->loadHTML($page);
$xpath = new DOMXpath($doc);
$rows = $xpath->query('//tr[td/span[@class=""]]');

foreach ($rows as $row) {
    $cells = $row->getElementsByTagName('td');

    $version = $cells->item(3)->nodeValue;
    $partNumber = $cells->item(4)->nodeValue;

    echo "Version: {$version}, Part Number: {$partNumber}", PHP_EOL;
}

See https://eval.in/860217

edited Sep 12, 2017 at 15:00

answered Sep 12, 2017 at 14:36

iainn

17.4k9 gold badges38 silver badges45 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Eike Over a year ago

I get the table through a curl command and have stored it in $page. How would I make that work with your code?

iainn Over a year ago

If the page is well-formed, you should just be able to use $sxml = simplexml_load_string($page); instead of the first line. I've also edited the answer with a DOMDocument, in case that doesn't work.

Eike Over a year ago

Thank you - the alternative DOMDocument approach works great!

Sauron1953 · Accepted Answer · 2017-09-12 14:40:02Z

-1

I would use next XPath expression:

//td[text()="Version"] | //td[text()="Partnumber"]

Which gives me:

Element='<td>Version</td>'
Element='<td>Partnumber</td>'  
Element='<td>Version</td>'
Element='<td>Partnumber</td>'

answered Sep 12, 2017 at 14:40

Sauron1953

377 bronze badges

Collectives™ on Stack Overflow

PHP DOM/xpath check element span class value

2 Answers 2

Alternative DOMDocument Version

3 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Alternative DOMDocument Version

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related