2

I have a python crawler which uses phanthomjs to crawl the sites and I am trying to stop loading 'css' contents from those webpages.I found a following code from various internet sources to stop 'CSS' loading, but that is not working .Please help me in fixing this issue.I also tried other solutions mentioned in stack overflow but that too didn't worked.

driver = webdriver.PhantomJS()

driver.command_executor._commands['executePhantomScript'] = ('POST', '/session/$sessionId/phantom/execute')
driver.execute('executePhantomScript', {'script': '''
var page = this;
page.onResourceRequested = function(requestData, request) {
 if ((/http:\/\/.+?\.css/gi).test(requestData['https://www.whatismyip.com/']) || requestData.headers['Content-Type'] == 'text/css') {
        console.log('The url of the request is matching. Aborting: ' + requestData['https://www.whatismyip.com/']);
        request.abort();
}
''', 'args': []})

driver.get("https://www.whatismyip.com/")
ipaddress=driver.find_element_by_xpath("//div[@class='ip']").text
print ipaddress
driver.quit()
2
  • An alternative option could be to start up a proxy and let it filter out requests with text/css mimetype. And here is how you can specify it when initializing PhantomJS webdriver instance: stackoverflow.com/questions/14699718/…. Commented Sep 10, 2015 at 12:28
  • Hi,Thanks for you suggestion.I saw that link and it is describing how to set the proxy and already i have already my proxy settings as follows service_args = [--proxy=x.x.x.x:8080,'--proxy-type=http','--web-security=false','--ignore-ssl-errors=true','--local-to-remote-url-access=true',] webdriver.PhantomJS.__init__(self,service_args=service_args,desired_capabilities=dcap) . Could you please suggest what change do i have to make in this settings Commented Sep 10, 2015 at 12:59

1 Answer 1

0

You're testing the regex against requestData['https://www.whatsmyip.com/'] which I'm assuming is null -- this is fixed by using requestData.url as per the documentation. Also, a request will not contain a Content-Type so this conditional can be removed.

I chose to simplify your regular expression, since some URLs may be served with SSL or relative and will not match http://. I will use a $ anchor to test for .css at the end of the URL (the global modifier is not necessary, since you're only looking for one match).

Your final .onResourceRequested callback may contain a conditional like this:

if(/\.css$/i.test(requestData.url)) {
    request.abort();
}
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks for your effort .I modified the code as follows but still it is not working.driver = webdriver.PhantomJS() driver.command_executor._commands['executePhantomScript'] = ('POST', '/session/$sessionId/phantom/execute') driver.execute('executePhantomScript', {'script': ''' var page = this; page.onResourceRequested = function(requestData, request) { if(/\.css$/i.test(requestData.url)) { request.abort(); } ''', 'args': []}) driver.get("https://www.whatismyip.com/") ipaddress=driver.find_element_by_xpath("//div[@class='ip']").text print ipaddress driver.quit() .
Do I have to make any other changes in above code,I am really confused in achieving this.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.