3

Just trying to extract some information from the AEC website (e.g. http://apps.aec.gov.au/eSearch/LocalitySearchResults.aspx?filter=3977&filterby=Postcode). The XPath query I'm running is "//x:tbody/x:tr/x:td[4]/x:a", which I've tested in XPath Checker (the Firefox extension) and it pulls up the relevant locality data.

I'm then using PHP to load the page, execute the query and then iterate through the results.

$ch = curl_init();
$timeout = 5;
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
$html = curl_exec($ch);
curl_close($ch);

# Create a DOM parser object
$dom = new DOMDocument();
libxml_use_internal_errors(true);


 $dom->loadHTML($html);

$xpath = new DOMXpath($dom);

$elements = $xpath->query( '//tbody/tr/td[4]/a');


foreach ($elements as $element) {
     echo $element;
}

I'm then getting:

Warning: Invalid argument supplied for foreach() in /home/givesh5/public_html/dig/electoratesearch.php on line 41

It seems that the query is returning some sort of boolean rather than a list of matches for the query?

Relevant markup as follows:

<table cellspacing="0" rules="all" border="1" id="ContentPlaceHolderBody_gridViewLocalities" style="border-collapse:collapse;">
        <tr class="headingLink">
            <th scope="col"><a href="javascript:__doPostBack(&#39;ctl00$ContentPlaceHolderBody$gridViewLocalities&#39;,&#39;Sort$StateAb&#39;)">State</a></th><th scope="col"><a href="javascript:__doPostBack(&#39;ctl00$ContentPlaceHolderBody$gridViewLocalities&#39;,&#39;Sort$LocalityNm&#39;)">Locality/Suburb</a></th><th scope="col"><a href="javascript:__doPostBack(&#39;ctl00$ContentPlaceHolderBody$gridViewLocalities&#39;,&#39;Sort$Postcode&#39;)">Postcode</a></th><th scope="col"><a href="javascript:__doPostBack(&#39;ctl00$ContentPlaceHolderBody$gridViewLocalities&#39;,&#39;Sort$DivisionNm&#39;)">Electorate</a></th><th scope="col"><a href="javascript:__doPostBack(&#39;ctl00$ContentPlaceHolderBody$gridViewLocalities&#39;,&#39;Sort$DivisionNmRedistributed&#39;)">Redistributed Electorate</a></th><th scope="col">Other Locality(s)</th>
        </tr><tr>
            <td>VIC</td><td>BOTANIC RIDGE</td><td><a href="LocalitySearchResults.aspx?filter=3977&amp;filterby=Postcode">3977</a></td><td><a href="LocalitySearchResults.aspx?filter=Flinders&amp;filterby=Electorate&amp;divid=211">Flinders</a></td><td></td><td>&nbsp;</td>
        </tr><tr>
            <td>VIC</td><td>CANNONS CREEK</td><td><a href="LocalitySearchResults.aspx?filter=3977&amp;filterby=Postcode">3977</a></td><td><a href="LocalitySearchResults.aspx?filter=Flinders&amp;filterby=Electorate&amp;divid=211">Flinders</a></td><td></td><td>&nbsp;</td>
        </tr><tr>
            <td>VIC</td><td>CRANBOURNE</td><td><a href="LocalitySearchResults.aspx?filter=3977&amp;filterby=Postcode">3977</a></td><td><a href="LocalitySearchResults.aspx?filter=Holt&amp;filterby=Electorate&amp;divid=216">Holt</a></td><td></td><td>&nbsp;</td>
        </tr><tr>
            <td>VIC</td><td>CRANBOURNE EAST</td><td><a href="LocalitySearchResults.aspx?filter=3977&amp;filterby=Postcode">3977</a></td><td><a href="LocalitySearchResults.aspx?filter=Flinders&amp;filterby=Electorate&amp;divid=211">Flinders</a></td><td></td><td>&nbsp;</td>
        </tr><tr>
            <td>VIC</td><td>CRANBOURNE EAST</td><td><a href="LocalitySearchResults.aspx?filter=3977&amp;filterby=Postcode">3977</a></td><td><a href="LocalitySearchResults.aspx?filter=Holt&amp;filterby=Electorate&amp;divid=216">Holt</a></td><td></td><td>&nbsp;</td>
        </tr><tr>
            <td>VIC</td><td>CRANBOURNE NORTH</td><td><a href="LocalitySearchResults.aspx?filter=3977&amp;filterby=Postcode">3977</a></td><td><a href="LocalitySearchResults.aspx?filter=Holt&amp;filterby=Electorate&amp;divid=216">Holt</a></td><td></td><td>&nbsp;</td>
        </tr><tr>
            <td>VIC</td><td>CRANBOURNE SOUTH</td><td><a href="LocalitySearchResults.aspx?filter=3977&amp;filterby=Postcode">3977</a></td><td><a href="LocalitySearchResults.aspx?filter=Flinders&amp;filterby=Electorate&amp;divid=211">Flinders</a></td><td></td><td>&nbsp;</td>
        </tr><tr>
            <td>VIC</td><td>CRANBOURNE WEST</td><td><a href="LocalitySearchResults.aspx?filter=3977&amp;filterby=Postcode">3977</a></td><td><a href="LocalitySearchResults.aspx?filter=Holt&amp;filterby=Electorate&amp;divid=216">Holt</a></td><td></td><td>&nbsp;</td>
        </tr><tr>
            <td>VIC</td><td>DEVON MEADOWS</td><td><a href="LocalitySearchResults.aspx?filter=3977&amp;filterby=Postcode">3977</a></td><td><a href="LocalitySearchResults.aspx?filter=Flinders&amp;filterby=Electorate&amp;divid=211">Flinders</a></td><td></td><td>&nbsp;</td>
        </tr><tr>
            <td>VIC</td><td>FIVEWAYS</td><td><a href="LocalitySearchResults.aspx?filter=3977&amp;filterby=Postcode">3977</a></td><td><a href="LocalitySearchResults.aspx?filter=Flinders&amp;filterby=Electorate&amp;divid=211">Flinders</a></td><td></td><td><a href="LocalitySearchResults.aspx?filter=DEVON+MEADOWS&amp;filterby=LocalityorSuburb&amp;state=VIC">DEVON MEADOWS</a></td>
        </tr><tr>
            <td>VIC</td><td>JUNCTION VILLAGE</td><td><a href="LocalitySearchResults.aspx?filter=3977&amp;filterby=Postcode">3977</a></td><td><a href="LocalitySearchResults.aspx?filter=Flinders&amp;filterby=Electorate&amp;divid=211">Flinders</a></td><td></td><td>&nbsp;</td>
        </tr><tr>
            <td>VIC</td><td>SANDHURST</td><td><a href="LocalitySearchResults.aspx?filter=3977&amp;filterby=Postcode">3977</a></td><td><a href="LocalitySearchResults.aspx?filter=Isaacs&amp;filterby=Electorate&amp;divid=219">Isaacs</a></td><td></td><td>&nbsp;</td>
        </tr><tr>
            <td>VIC</td><td>SKYE</td><td><a href="LocalitySearchResults.aspx?filter=3977&amp;filterby=Postcode">3977</a></td><td><a href="LocalitySearchResults.aspx?filter=Dunkley&amp;filterby=Electorate&amp;divid=210">Dunkley</a></td><td></td><td>&nbsp;</td>
        </tr><tr>
            <td>VIC</td><td>SKYE</td><td><a href="LocalitySearchResults.aspx?filter=3977&amp;filterby=Postcode">3977</a></td><td><a href="LocalitySearchResults.aspx?filter=Isaacs&amp;filterby=Electorate&amp;divid=219">Isaacs</a></td><td></td><td>&nbsp;</td>
        </tr>
    </table>
3
  • DOMXpath returns false if the expression is malformed or the contextnode is invalid Commented Apr 4, 2015 at 9:19
  • can you please supply relevant parts of the markup you are parsing. XPaths derived from Firefox are from the live DOM which can includes implied markup. So it's not reliable to get them that way. Also, what exactly are you trying to fetch? Commented Apr 4, 2015 at 9:30
  • 1
    Have updated OP with markup, thanks. In this case, trying to fetch the link text (e.g. <a...>Text</a>) for the locality. In the first two cells, this would be 'Flinders', for example. Commented Apr 4, 2015 at 9:35

2 Answers 2

1

It seems that the query is returning some sort of boolean rather than a list of matches for the query?

Yes it does, it can return a boolean, it then will be FALSE. It signals that there was an error running the xpath query. This can be caused by one of the two parameters passed to DOMXpath::query()Php Manual , either the xpath expression or the context node.

In your case you only use one parameter, so this signals that the xpath expression is wrong. However the one you use is not wrong and does not cause a boolean FALSE. But as you experienced that error I assume there might have been something else wrong, so probably the xpath object is not fully initialized, but even with no or a partial download I simulated I was not able to reproduce the error. It's perhaps a difference with the PHP version? I don't know.

For the actual xpath expression, it applies what adeneo and Gordon already wrote, the <tbody>-element is inserted into the DOM by Firefox, the DOMDocument implementation in PHP behaves differently here. You can either mimic Firefox here (more work) -or- you just search for the actual table element, then it works fine. Here a working example:

$url = 'http://apps.aec.gov.au/eSearch/LocalitySearchResults.aspx?filter=3977&filterby=Postcode';

# Create a DOMDocument to parse HTML
$doc    = new DOMDocument();
$saved  = libxml_use_internal_errors(true);
$result = $doc->loadHTMLFile($url);
libxml_use_internal_errors($saved);
if (false === $result) {
    throw new UnexpectedValueException(sprintf('Failed to create DOMDocument from url %s', var_export($url, true)));
}

# Create a DOMXPath to get data from HTML document
$xpath = new DOMXpath($doc);

$expression = '//table/tr/td[4]/a';
$elements   = $xpath->query($expression);
if (false === $elements) {
    throw new UnexpectedValueException(sprintf('The xpath expression %s failed', var_export($expression, true)));
}

foreach ($elements as $index => $element) {
    printf("#%02d: %s\n", $index + 1, trim($element->textContent));
}

And the exemplary output:

#01: Flinders
#02: Flinders
#03: Holt
#04: Flinders
#05: Holt
#06: Holt
#07: Flinders
#08: Holt
#09: Flinders
#10: Flinders
#11: Flinders
#12: Isaacs
#13: Dunkley
#14: Isaacs
Sign up to request clarification or add additional context in comments.

Comments

0

There is no tbody in that HTML.
The browser will insert tbody elements where needed, but we're not using the browser, we're using DOMDocument which does not insert the tbody elements.

Instead, the tr elements are direct children of the table

$elements = $xpath->query( '//table/tr/td[4]/a');

foreach ($elements as $element) {
     echo $dom->saveHTML($element);
}

4 Comments

// should match the selection midway through the document? In this sense, if table/tr/td was a unique selector, then we could simply leave out the preceding parts of the path and still access the same information via //table/tr/td[4]. Is that incorrect?
@Edward - Yes, that's correct, I just copied the path from the console, but testing it //table/tr/td[4]/a works as well, but what you've got, //tbody/tr/td[4]/a does not work
Probably because there is no tbody, duh.
Ah ok. That makes sense. At least I'm getting a nodelist now, but unfortunately it has 0 nodes in it.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.