5

Is there a way to execute all the JavaScripts in a webpage exactly like the browser without specifying which function to execute? In most of the examples that I saw these seem to specify which portion of JavaScript you want to execute from the scraped webpage. I need to scrape all of the contents and execute all of the JavaScripts just like a browser and get me the final executed code which we can see using google inspect?

I am sure there must be some way, but the example code from PhantomJS did not seem to have any example addressing this.

2 Answers 2

4

You don't specify what gets executed from the page with PhantomJS. You open the page with PhantomJS and all JavaScript that is executed in Chrome or Firefox is also executed in PhantomJS. It is a full browser without a "head".

There are some differences though. Clicking a download link will not trigger a download. The rendering engine which PhantomJS 1.x is based upon is nearly 4 years old, so some pages are simply rendered differently, because PhantomJS 1.x might not support that feature. (PhantomJS 2 is on the way and now in unofficial "alpha" status)

So you need to script every interaction that a user is doing on the page with JavaScript or CoffeeScript. You don't call page functions. You manipulate DOM elements to simulate a user interacting with the page in the browser. This needs to be done in such a crude way, because the PhantomJS API doesn't provide high-level user-like functions. If you want those, you have to look at CasperJS which is built on top of PhantomJS/SlimerJS.

There you actually have functions like click, wait, fetchText, etc.

Sign up to request clarification or add additional context in comments.

4 Comments

I am not sure of the syntax here, i did ` var text = page.evaluate(function () { return document.title + '\n' + document.body.innerText; }); ` gives me the text but i need with the html tags as it would be seen in inspect. I m not sure of the syntax
If you want the page source of body with the tags, then use document.body.innerHTML inside of page.evaluate just like in any other browser. If you want the complete page source, you either access page.content outside of page context or get document.documentElement.outerHTML from inside page.evaluate. Again PhantomJS is just a normal browser, so everything you type in the Chrome Developer Tools, you can do inside page.evaluate. I guess you have to learn more about JavaScript in the browser to use it well. Please ask a proper question next time and do some research.
I appreciate and thank you for your answer. Yes I will try to find myself in problems which is suitable for this forum.
StackOverflow (SO) is not a forum. Forum threads tend to go on forever with many turns. The good thing on SO is that there is a rigid structure: Q&A. There is very little room for discussions. I particularly dislike long comment threads, because of course I can help you, but future readers may be overwhelmed with the amount of back and forth in the comments (comments may be deleted). 20 comments in a short amount of time automatically raises a moderator flag. I can give you pointers in the comments, but the real work has to be done by you. If you can't do it, I'm happy to answer your question.
1

This will work, put this in a file named "scrape.js" and execute it with phantomjs. Pass your url as the first arg

// Usage: phantomjs scrape.js http://your.url.to.scrape.com
"use strict";
var sys = require("system"),
    page = require("webpage").create(),
    logResources = false,
    url = sys.args[1]

//console.log('fetch from', url);

function printArgs() {
    var i, ilen;
    for (i = 0, ilen = arguments.length; i < ilen; ++i) {
        console.log("    arguments[" + i + "] = " + JSON.stringify(arguments[i]));
    }
    console.log("");
}



////////////////////////////////////////////////////////////////////////////////


page.onLoadFinished = function() {
   page.evaluate(function() {
		     console.log(document.body.innerHTML);
     });
};
// window.console.log(msg);
page.onConsoleMessage = function() {
    printArgs.apply(this, arguments);
    phantom.exit(0);
};



////////////////////////////////////////////////////////////////////////////////

page.open(url);

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.