PHP CURL and DOMDocument error

Question

I'm trying to extract information with curl and DOMDocument, I have to extract a div layer all links that has.

but does not show me anything, and I do not understand because without curl if it works.

  function media_uri_request($url, $method='', $vars='') 
  {
        $ch = curl_init();
        if ($method == 'post') 
        {
        curl_setopt ($ch, CURLOPT_POST, 1);
        curl_setopt ($ch, CURLOPT_POSTFIELDS, $vars);
        }

        curl_setopt ($ch, CURLOPT_URL, $url);
        curl_setopt ($ch, CURLOPT_HEADER, false);
        curl_setopt ($ch, CURLOPT_RETURNTRANSFER, true);
        curl_setopt ($ch, CURLOPT_FAILONERROR, false);
        curl_setopt ($ch, CURLOPT_FOLLOWLOCATION, true);
        curl_setopt ($ch, CURLOPT_SSL_VERIFYPEER, false);
        curl_setopt ($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7.12) Gecko/20050915 Firefox/1.0.7");
        curl_setopt ($ch, CURLOPT_HTTPHEADER, array("REMOTE_ADDR: ".$_SERVER['REMOTE_ADDR'], "HTTP_X_FORWARDED_FOR: ".$_SERVER['REMOTE_ADDR']));
        curl_setopt ($ch, CURLOPT_COOKIEJAR, 'tmp/cookie.txt');
        curl_setopt ($ch, CURLOPT_COOKIEFILE, 'tmp/cookie.txt');
        curl_setopt ($ch, CURLOPT_MAXREDIRS, 10);
        curl_setopt ($ch, CURLOPT_TIMEOUT, 0);
        curl_setopt ($ch, CURLOPT_CONNECTTIMEOUT, 0);
        $buffer = curl_exec($ch);
        curl_close ($ch);

        if (isset($buffer) && filter_var($buffer, FILTER_SANITIZE_URL)) {
        $urls = Array();

        $dom = new DOMDocument();
        @$dom->loadHTMLFile($buffer);

        foreach($dom->getElementsByTagName('a') as $buffer) {
            $urls[] = Array(
                'name'  => $buffer->nodeValue,
                'href'  => $buffer->getAttribute('href'),
                'title' => $buffer->getAttribute('title'),
                'rel'   => $buffer->getAttribute('rel'),
                'id'    => $buffer->getAttribute('id'),
            );
        }
        return $urls;
    }
  }

in currently showing me all the links on the page, but I want only need one id of a div and get this links.

<div id="something">
<a href="anylink">sometitle</a>
<a href="anylink">sometitle</a>
<a href="anylink">sometitle</a>
<a href="anylink">sometitle</a>
</div>

Can you help me please?

Sergey Eremin · Accepted Answer · 2012-10-03 23:44:02Z

2

Replace

foreach($dom->getElementsByTagName('a') as $buffer) {

with

foreach($dom->getElementById('something')->getElementsByTagName('a') as $buffer) {

Doing this you find the div by its id first and then search its children for links. Read DOMDocument::getElementById() for more info.

There is a different way:

$xpath = new DOMXPath($dom);
$elements = $xpath->query("//*[@id=something]");
if ($elements->length > 0) {
    foreach ($elements->item(0)->getElementsByTagName('a') as $buffer) {

Also, use @$dom->loadHTML($buffer); - you load HTML from a string, not from a file.

edited Oct 3, 2012 at 23:44

answered Oct 3, 2012 at 22:59

Sergey Eremin

11.1k2 gold badges43 silver badges44 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

aleksander haugas Over a year ago

thanks but i got an error Call to a member function getElementsByTagName() on a non-object in i dont know why

Sergey Eremin Over a year ago

This means there is no element with id "something" in the document.

aleksander haugas Over a year ago

i dont know but isnt work for me correctly, this return a blank page and the id has exists in the html, when i passed only getElementsByTagName('a') im got all the links on the page, thanks for your time

Collectives™ on Stack Overflow

PHP CURL and DOMDocument error

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related