0

I'm trying to extract information with curl and DOMDocument, I have to extract a div layer all links that has.

but does not show me anything, and I do not understand because without curl if it works.

  function media_uri_request($url, $method='', $vars='') 
  {
        $ch = curl_init();
        if ($method == 'post') 
        {
        curl_setopt ($ch, CURLOPT_POST, 1);
        curl_setopt ($ch, CURLOPT_POSTFIELDS, $vars);
        }

        curl_setopt ($ch, CURLOPT_URL, $url);
        curl_setopt ($ch, CURLOPT_HEADER, false);
        curl_setopt ($ch, CURLOPT_RETURNTRANSFER, true);
        curl_setopt ($ch, CURLOPT_FAILONERROR, false);
        curl_setopt ($ch, CURLOPT_FOLLOWLOCATION, true);
        curl_setopt ($ch, CURLOPT_SSL_VERIFYPEER, false);
        curl_setopt ($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7.12) Gecko/20050915 Firefox/1.0.7");
        curl_setopt ($ch, CURLOPT_HTTPHEADER, array("REMOTE_ADDR: ".$_SERVER['REMOTE_ADDR'], "HTTP_X_FORWARDED_FOR: ".$_SERVER['REMOTE_ADDR']));
        curl_setopt ($ch, CURLOPT_COOKIEJAR, 'tmp/cookie.txt');
        curl_setopt ($ch, CURLOPT_COOKIEFILE, 'tmp/cookie.txt');
        curl_setopt ($ch, CURLOPT_MAXREDIRS, 10);
        curl_setopt ($ch, CURLOPT_TIMEOUT, 0);
        curl_setopt ($ch, CURLOPT_CONNECTTIMEOUT, 0);
        $buffer = curl_exec($ch);
        curl_close ($ch);

        if (isset($buffer) && filter_var($buffer, FILTER_SANITIZE_URL)) {
        $urls = Array();

        $dom = new DOMDocument();
        @$dom->loadHTMLFile($buffer);

        foreach($dom->getElementsByTagName('a') as $buffer) {
            $urls[] = Array(
                'name'  => $buffer->nodeValue,
                'href'  => $buffer->getAttribute('href'),
                'title' => $buffer->getAttribute('title'),
                'rel'   => $buffer->getAttribute('rel'),
                'id'    => $buffer->getAttribute('id'),
            );
        }
        return $urls;
    }
  }

in currently showing me all the links on the page, but I want only need one id of a div and get this links.

<div id="something">
<a href="anylink">sometitle</a>
<a href="anylink">sometitle</a>
<a href="anylink">sometitle</a>
<a href="anylink">sometitle</a>
</div>

Can you help me please?

1 Answer 1

2

Replace

foreach($dom->getElementsByTagName('a') as $buffer) {

with

foreach($dom->getElementById('something')->getElementsByTagName('a') as $buffer) {

Doing this you find the div by its id first and then search its children for links. Read DOMDocument::getElementById() for more info.

There is a different way:

$xpath = new DOMXPath($dom);
$elements = $xpath->query("//*[@id=something]");
if ($elements->length > 0) {
    foreach ($elements->item(0)->getElementsByTagName('a') as $buffer) {

Also, use @$dom->loadHTML($buffer); - you load HTML from a string, not from a file.

Sign up to request clarification or add additional context in comments.

3 Comments

thanks but i got an error Call to a member function getElementsByTagName() on a non-object in i dont know why
This means there is no element with id "something" in the document.
i dont know but isnt work for me correctly, this return a blank page and the id has exists in the html, when i passed only getElementsByTagName('a') im got all the links on the page, thanks for your time

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.