1

i'm using the following code to scrape some external divs for http://psnc.org.uk/our-latest-news-category/psnc-news/

I wanting to scrape the PSNC News Latest News section

$ch = curl_init("http://psnc.org.uk/our-latest-news-category/psnc-news/");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$output = curl_exec($ch);
curl_close($ch);

$document = new DOMDocument;
libxml_use_internal_errors(true);
$document->loadHTML($output);
$xpath = new DOMXPath($document);

$tweets = $xpath->query("//article[@class='news-template-box']");

echo "<html><body>";
foreach ($tweets as $tweet) {
echo "\n<p>".$tweet->nodeValue."</article>\n";
}
echo "</html></body>";

It successfully scrapes the text but the links / href's / images infact all elements do not appear.

Am I missing something?

6
  • when you putting $xpath->query("*"); you get all data Commented Jan 5, 2017 at 16:16
  • I only want to scrape a DIV not the entire page Commented Jan 5, 2017 at 16:18
  • which div ? ??? Commented Jan 5, 2017 at 16:22
  • article class="news-template-box" Commented Jan 5, 2017 at 16:26
  • OR <div class="page-content twelve columns clear"> Commented Jan 5, 2017 at 16:26

1 Answer 1

1

DOMNode::nodeValue == DOMNode::textContent, only print text content.

http://php.net/manual/en/class.domnode.php#domnode.props.nodevalue

$tweets = $xpath->query("//article[@class='news-template-box']");

foreach ($tweets as $tweet) {
    echo $document->saveHTML($tweet);
}
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.