1

I need to load content from a remote uri into a PHP variable locally. The remote page only shows content when JavaScript is turned on. How can I get around this?

Essentially, how can I use cURL for pages requiring JavaScript loaded content?

3
  • How can that be possible? I don't think they can actually check for that... Commented Aug 21, 2012 at 14:03
  • It is possible if you have an AJAX call or something to report to the server. In that way, the page could hide contents until fetched by the 2nd request. This is often done to prevent scraping though. Commented Aug 21, 2012 at 14:07
  • 1
    Do you think maybe they don't want you scraping the site? Commented Aug 21, 2012 at 14:07

2 Answers 2

3

Mink was the only php headless browswer that I could find. As noted selenium is another popular choice. I don't know how good of performance these will offer though if you have a lot of scraping to do. They seem to be more geared towards testing?

A number of other languages have them which are listed in the link below. Since php does does not process javascript you will need another tool. Headless browswers expose the javascript engine and allow you to interact with the browser programattically.

headless internet browser?

Sign up to request clarification or add additional context in comments.

Comments

1

To do this you have to emulate a browser using a browser plugin such as selenium. This will involve slightly more than just a simple get request though.

http://seleniumhq.org/

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.