1

I'm a bit new at curl and xpath so still learning the in's and out's. I have written a scraper but when i try to show the scraped data via an array, nothing shows up. So what is wrong with my code?

<?php

ini_set("display_errors", "1");
error_reporting(-1);
error_reporting(E_ERROR);
libxml_use_internal_errors(true);

//Basic Function
function get_url_contents($url, $timeout = 10, $userAgent = 'Mozilla/5.0(Macintosh; U; Intel Mac OS X 10_5_8; en-US)AppleWebKit/534.10 (KHTML, like Gecko) Chrome/8.0.552.215 Safari/534.10'){
    $rawhtml = curl_init();//handler
    curl_setopt($rawhtml, CURLOPT_URL,$url);//url
    curl_setopt($rawhtml, CURLOPT_RETURNTRANSFER, 1);//return result as string rahter than direct output
    curl_setopt($rawhtml, CURLOPT_CONNECTTIMEOUT,$timeout);//set timeout
    curl_setopt($rawhtml, CURLOPT_USERAGENT,$userAgent);//set user agent
    $output = curl_exec($rawhtml);//execute curl call
    curl_close($rawhtml);//close connection

    if(!$output){
        return -1;//if nothing obtained, return -1
    }
    return $output;
}

//get raw html
$html_string = get_url_contents("http://www.beursgorilla.nl/fonds-informatie.asp?naam=Aegon&cat=koersen&subcat=1&instrumentcode=955000020");//url here
//load HTML into DOM object
//ref http://www.php.net/manual/en/domdocument.loadhtml.php
//note html does not have to be well fpr,ed with this function

$dom_object = new DOMDocument();
@$dom_object->loadHTML($html_string);

//perform Xpath queries on DOM
//ref http://www.php.net/manual/en/domxpath.query.php

$xpath = new DOMXPath($dom_object);

//perform Xpath query
//use any specfic property to narrow focus

$nodes = $xpath->query("//table[@class='maintable']/tbody/tr[4]/td[2]/table[@class='koersen_tabel']/tbody/tr[2]/td[@class='koersen_tabel_midden']");

//setup some basic variables

$i = -1; //$i = counter

//when process nodes as below, cycling trough
//but not grabbing data from the header row of the table

$result = array();

//preform xpath subqueries to get numbers

foreach($nodes as $node){
    $i++;
    //using each 'node' as the limit for the new xpath to search within
    //make queries relative by starting them with a dot (e.g. ".//...")

    $details = $xpath->query("//table[3]/tbody/tr/td[1]/table[@class='fonds_info_koersen_links']/tbody/tr[1]/td[2]", $node);
    foreach($details as $detail){
        $result[$i][''] = $detail->nodeValue;
    }

    $details = $xpath->query("//table[3]/tbody/tr/td[1]/table[@class='fonds_info_koersen_links']/tbody/tr[4]/td[2]", $node);
    foreach($details as $detail){
         $result[$i][''] = $detail->nodeValue;
    }

    if(curl_errno($rawhtml)){
        echo 'Curl error: ' . curl_error($rawhtml);

        print'<pre>';   
        print_r($result);
        print '</pre>';
    }
}

?>

I have checked the xpath query's via Chrome's element inspector and they seem to be correct. I really don't know what is wrong with the code.

1
  • Use more echo to see what's going on in script - print all variables and which if/foreach is executed. Commented Jul 13, 2014 at 15:48

2 Answers 2

1

What about this line of code?

$result[$i][''] = $detail->nodeValue;

Shouldn't this look like:

$result[$i][] = $detail->nodeValue;

(look at square braces)

Sign up to request clarification or add additional context in comments.

Comments

0

I have rewritten my crawler and used PHP Simple HTML DOM Parser. This fixed my problem, everything works now :).

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.