0

I am struggling for a while to make this work but seems that I am missing something. The scenarios is this:
I am trying to get some informations from a website using PHP and cURL via DOMXpath query. I am getting any information till to a point and from that point and below i don't get anything...blank. The script that I am using is as below:

$target_url = "https[:]//[www][.]bankofalbania[.]org/Tregjet/Kursi_zyrtar_i_kembimit/"; //Remove [ and ] from url
$userAgent = 'Googlebot/2.1 (http://www.googlebot.com/bot.html)';

$ch = curl_init();
curl_setopt($ch, CURLOPT_USERAGENT, $userAgent);
curl_setopt($ch, CURLOPT_URL,$target_url);
curl_setopt($ch, CURLOPT_FAILONERROR, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_AUTOREFERER, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER,true);
curl_setopt($ch, CURLOPT_VERBOSE, true);
curl_setopt($ch, CURLOPT_TIMEOUT, 1000);

$html= curl_exec($ch);
if (!$html) {
    echo "<br />cURL error number:" .curl_errno($ch);
    echo "<br />cURL error:" . curl_error($ch);
    exit;
}

// parse the html into a DOMDocument
$document = new DOMDocument();
libxml_use_internal_errors(true);
$document->loadHTML($html);
libxml_clear_errors();
$selector = new DOMXPath($document);

$anchors = $selector->query('/html/body/div[1]/section[1]/div/div[2]/div[2]/div[2]/div/table[1]/tbody/tr[1]/td[1]');
    foreach($anchors as $div) { 
        $value = $div->nodeValue;
        echo $value;
}

Intersting is that, if the $anchors is changed to this
$anchors = $selector->query('/html/body/div[1]/section[1]/div/div[2]/div[2]/div[2]/div/table[1]');
The content is extracted from the website. Also, I should mention that I have tried to change the query to something more direct, as below:

$anchors = $selector->query('//table[@class="table table-sm table-responsive w-100 d-block d-md-table table-bordered m-0"]/tbody/tr[1]/td[3]');

but the results are the same...null! I don't know what I am missing here but I can't make it run. What i am looking forward to get is the value of USD from the table of the page on $target_url.
Thank you in advance :-)

1 Answer 1

1

There's no tbody tags in the html, and unlike Javascript, PHP doesn't add it automatically (keep that in mind when you use the developper tools provided by your browser). Also the amount of USD is in the third cell, so the correct XPath query is:

/html/body/div[1]/section[1]/div/div[2]/div[2]/div[2]/div/table[1]/tr[1]/td[3]
Sign up to request clarification or add additional context in comments.

1 Comment

Thank you very much for your help. I didn't knew that. I should search around for any other tag that may cause the same problem. Now everything is working perfectly. Thanks again! P.S. I have an issue with other website where the string that i am looking for is in an inline javascript. Is it possible to get it somehow? Thanks!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.