Scraping contents of web that has specific url

Question

I wanna get the titles and urls that has specific doc links. so, from the codes below, I wanna get informations : the Titles, and http://linkWeb.com that has specific url download .pdf http://link.pdf

Here's the html page :

<div class="title-download">
<div id="01divTitle" class="title">
    <h3>
        <a id="01Title" onmousedown="" href="http://linkWeb.com">Titles</a>
        <span id="01LbCitation" class="citation">(<a id="01Citation" href="http://citation.com">Citations</a>)</span></h3>
</div>
<div id="01downloadDiv" class="download">
    <a id="01_downloadIcon" title="http://link.pdf" onmousedown="" target=""><img id="ctl01_icon" class="small-icon";" /></a>
</div>

and here's the code, but it returns blank result :

<?php
include 'simple_html_dom.php';
set_time_limit(0);
$url  ='http://example.com';
$html = file_get_html($url) or die ('invalid url');

foreach($html->find('span[class=citation]') as $link){
    foreach($link->parent()->parent()->find('.download a') as $link2){  //I confused with the code in this line
       if(strtolower(substr($link2->title, strrpos($link2->title, '.'))) === '.pdf') {
           $link = $link->prev_sibling();
           echo $link->plaintext.'<br>';
           echo $link->href.'<br>';
       echo $link2->title.'<br>'; 
       }
    }
}
?>

Wait, http://link.pdf? How does that work? Or is that just a dummy URL instead of publishing the actual site name? — Matchu
– Matchu, Commented Jul 22, 2012 at 3:10
@Matchu oh, I am sorry. Typo in the html page. I edited it = title"http://link.pdf" <- in the class download. — bruine
– bruine, Commented Jul 22, 2012 at 4:52

Matchu · Accepted Answer · 2012-07-22 03:14:21Z

1

Given that $link is the citation span, $link->parent()->parent() returns the div with ID 01divTitle. And, since that div is a sibling of the .download element you're looking for rather than a parent, $link->parent()->parent()->find('.download a') returns no results.

Perhaps $link->parent()->parent()->parent()->find('.download a') would work better. There may be other issues, but that's definitely one of them.

answered Jul 22, 2012 at 3:14

Matchu

86.1k18 gold badges155 silver badges160 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Scraping contents of web that has specific url

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related