0

I have to read information from an HTML page and transfer it to multiple arrays for further processing. My approaches with xpath have not been so successful that I had access to the data I wanted.

The body section contains a table with different numbers of lines, as in the following example:

...
</tr>
<tr>
    <td class="name" title="43PUS6551" datalabel="43PUS6551">
        <span>43PUS6551</span>
    </td>
    <td datalabel="Internetnutzung" class="usage">eingeschränkt</td>
    <td datalabel="Onlinezeit heute" class="bar time">
        <span title="03:20 von 14:00 Stunden">
            <span style="width:23.81%;"/>
        </span>
    </td>
    <td datalabel="Zugangsprofil" class="profile">
        <select name="profile:user6418">
            <option value="filtprof1">Standard</option>
            <option value="filtprof3">Unbeschränkt</option>
            <option value="filtprof4">Gesperrt</option>
            <option value="filtprof5334">Network</option>
            <option value="filtprof5333" selected="selected">Stream</option>
            <option value="filtprof4526">X-Box_One</option>
        </select>
    </td>
    <td datalabel="" class="btncolumn">
        <button type="submit" name="edit" id="uiEdit:user6418" value="filtprof5333" class="icon edit" title="Bearbeiten"/>
    </td>
</tr>
<tr>
...

I need one array, which contains the title attribute from line 2 as key and gets the attribute name from the <select> section (line 12) as value.

$devices = [
    '43PUS6551' => 'profile:user6418'
    …
]

I started with this and I´m able to receive the keys for this array:

    $dom = new DOMDocument();
    $dom->preserveWhiteSpace = false;
    $dom->loadHTML($response);
    $xmlSite = simplexml_import_dom($dom);

    $devices = [];
    $rows = $xmlSite->xpath('//tr/td[@title=@datalabel]');
    foreach ($rows as $row) {
        $key = utf8_decode((string)$row->attributes()['title']);

But now I'm struggling to get the designated value. I tried different ways: upwards with parent and back down to the node <select> or with following-sibling. But I'm too stupid to use the xpath synthas properly.

If I accomplished that, I need an array which contains the attribute name from the <select> section (line 12) as key and the attribute value from the <option> section which is also selcted as value.

$filters = [
    'profile:user6418' => 'filtprof5333'
    …
]

Finally, I need one array containing the data from the <option> section (appears in every row):

$profiles = [
    'Standard' => 'filtprof1',
    'Unbeschränkt' => 'filtprof3,
    …
    'X-Box-One' => 'filtprof4526',
]

Any help for propper xpath-hints will be appreciated

3
  • PHP DomDocument may be what you are looking for. Commented Jul 31, 2019 at 23:21
  • Could I have tried xpath without PHP DOMDocument? Commented Aug 1, 2019 at 12:41
  • Well at this point, it depends on your code, you didn't put any code here though. Take a look at this answer, might help Difference between simplexml and Dom Commented Aug 1, 2019 at 13:24

2 Answers 2

0

Try it:

preg_match_all('/\<option value\="([a-z0-9]+)">([A-Za-z0-9\_\-]+)\<\/option\>/', $str, $match, PREG_SET_ORDER);
$profiles = array();
foreach($match as $row) {
  $profiles[$row[2]] = $row['1'];
}
print_r($profiles);
Sign up to request clarification or add additional context in comments.

1 Comment

To be honest, I do not like preg_match very much - especially not if, as in this case, I'm not sure what content the requested website returns to me. Therefore, I would rather realize that with xpath.
0

The following functions as desired for me:

    // convert html response into SimpleXML
    $dom = new DOMDocument();
    $dom->preserveWhiteSpace = false;
    $dom->loadHTML($response);
    $xmlSite = simplexml_import_dom($dom);

    // initialize processing values
    $devices = [];
    $options = [];
    $filters = [];

    // parse SimpleXML with xpath to get current data
    $rows = $xmlSite->xpath('//tr/td[@title=@datalabel]');  // these are the rows with assignments of devices to filters
    foreach ($rows as $row) {
        $key = utf8_decode((string)$row->attributes()['title']);    // name (label) of the devices
        if (preg_match('/Alle /', $key)) {                          // skip standard settings
            continue;
        }
        $select = $row->xpath('parent::*//select[@name]');  // find the line with the currently assigned ID for the device
        $value = (string)$select[0]->attributes()['name'];  // get the current ID ('profile:user*' or 'profile:landevice*')
        $devices[$key] = $value;

        $options = $select[0]->xpath('option');             // the defined filters (dropdown in each row)
        foreach ($options as $option) {
            $profiles[utf8_decode((string)$option)] = (string)$option->attributes()['value'];   // get label and ID of filters
            if (isset($option->attributes()['selected'])) {     // determine the filter currently assigned to the device
                $filters[$value] = (string)$option->attributes()['value'];  // get device (ID) and filter (ID)
            }
        }
    }

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.