Newbie: How to overcome Javascript "onclick" button to scrape web page?

Question

This is the link I want to scrape: http://www.prudential.com.hk/PruServlet?module=fund&purpose=searchHistFund&fundCd=MMFU_U

The "English Version" tab is at the upper right hand corner in order to show the English version of the web page.

There is a button I have to press in order to read the funds information on the web page. If not, the view is blocked, and using scrapy shell always result empty [].

<div onclick="AgreeClick()" style="width:200px; padding:8px; border:1px black solid; 
background-color:#cccccc; cursor:pointer;">Confirmed</div>

And the function of AgreeClick is:

function AgreeClick() {
var cookieKey = "ListFundShowDisclaimer";
SetCookie(cookieKey, "true", null);
Get("disclaimerDiv").style.display = "none";
Get("blankDiv").style.display = "none";
Get("screenDiv").style.display = "none";
//Get("contentTable").style.display = "block";
ShowDropDown();

How do I overcome this onclick="AgreeClick()" function to scrape the web page?

Community · Accepted Answer · 2017-05-23 12:17:05Z

5

You cannot just click the link inside scrapy (see Click a Button in Scrapy).

First of all, check if the data you need is already there - in the html (it is on the background - so it's there).

Another option is selenium:

from selenium import webdriver
import time

browser = webdriver.Firefox()
browser.get("http://www.prudential.com.hk/PruServlet?module=fund&purpose=searchHistFund&fundCd=MMFU_U")

elem = browser.find_element_by_xpath('//*[@id="disclaimer"]/div/div')
elem.click()
time.sleep(0.2)

elem = browser.find_element_by_xpath("//*")
print elem.get_attribute("outerHTML")

One more option is to use mechanize. It cannot execute js code, but, according to the source code, AgreeClick just sets the cookie ListFundShowDisclaimer to true. This is a starting point (not sure if it works):

import cookielib
import mechanize

br = mechanize.Browser()

cj = cookielib.CookieJar()
ck = cookielib.Cookie(version=0, name='ListFundShowDisclaimer', value='true', port=None, port_specified=False,
                      domain='www.prudential.com.hk', domain_specified=False, domain_initial_dot=False, path='/',
                      path_specified=True, secure=False, expires=None, discard=True, comment=None, comment_url=None,
                      rest={'HttpOnly': None}, rfc2109=False)
cj.set_cookie(ck)
br.set_cookiejar(cj)

br.open("http://www.prudential.com.hk/PruServlet?module=fund&purpose=searchHistFund&fundCd=MMFU_U")
print br.response().read()

Then, you can parse the result with BeautifulSoup or whatever you prefer.

edited May 23, 2017 at 12:17

CommunityBot

11 silver badge

answered May 7, 2013 at 18:59

alecxe

476k127 gold badges1.1k silver badges1.2k bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Shaardool Over a year ago

do you also have a solution in Requests? I am using Requests and I need to do this.

igauravsehrawat · Accepted Answer · 2014-10-17 09:57:11Z

5

Use the spynner library for Python to emulate a browser and execute the client-side javascript.

import spynner

browser = spynner.Browser()
url = "http://www.prudential.com/path/?args=values"

browser.load(url)

browser.runjs("AgreeClick();")

markup = browser._get_html()

As you can see, you can invoke any Javascript function available in the source of the page programmatically.

If you also need to parse results, I highly recommend BeautifulSoup.

edited Oct 17, 2014 at 9:57

igauravsehrawat

3,9844 gold badges38 silver badges48 bronze badges

answered May 7, 2013 at 14:15

pztrick

3,84132 silver badges35 bronze badges

Collectives™ on Stack Overflow

Newbie: How to overcome Javascript "onclick" button to scrape web page?

2 Answers 2

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related