0

I am new to scraping and have scrapped two websites formally. But the problem appeared to me when I tried to scrape dynamic loading websites. When the website is rendered with JavaScript, I am unable to scrape the contents of the website then.

Is there any way I can scrape the contents of that website using php curl or any other client related to PHP?

This is what I have done so far :

$link = "https://www.glassdoor.com/Job/jobs.htm?suggestCount=0&suggestChosen=false&clickSource=searchBtn&typedKeyword=android+developer&sc.keyword=android+developer&locT=N&locId=192&jobType=";

$ch = curl_init();
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch,CURLOPT_URL,$link);
curl_setopt($ch,CURLOPT_RETURNTRANSFER,1);
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/525.13 (KHTML, like Gecko) Chrome/0.A.B.C Safari/525.13");
$data = curl_exec($ch);


$document = new DOMdocument();
libxml_use_internal_errors(true);
$document->loadHTML($data);
$elements = $document->getElementsByTagName("div");

foreach($elements as $element){
  	echo $element->nodeValue."<br>";;
}

1

1 Answer 1

3

You need headless browser for this, you can use PHP Wrapper for PhantomJS , here is the link http://jonnnnyw.github.io/php-phantomjs/. This will solve your problem. It has following features:

  • Load webpages through the PhantomJS headless browser
  • View detailed response data including page content, headers, status code etc.
  • Handle redirects
  • View javascript console errors

Hope this helps.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.