Follow a javascript link with mechanize and python

Question

I'm doing some web scraping and the project is almost done, except for I need to click a javascript link and I can't work out how to with Python and mechanize.

On one of the pages, a list of javascript links appear and I want to follow them in turn, scrape some data, and repeat. I know mechanize doesn't work with javascript but does anyone know a workaround? Here's the code I use to isolate the links:

for Auth in iterAuths:
     Auth = str(Auth.contents[0]).strip()
     br.find_link(text=Auth)

now if I do br.follow_link(text=Auth), I get an error urllib2.URLError: <urlopen error unknown url type: javascript>.

If I do print br.click_link(text=Auth'), it prints as Request for javascript:SendThePage('5660')

I just need to get through the javascript link. Can anyone help?

blakev · Accepted Answer · 2013-09-15 07:36:10Z

2

When I needed to do something similar, I looked at the links I was trying to follow.

Some of them were static links generated with javascript. They were predictable/consistent enough that I could manually generate a list before hand.

Others were just constructed URLs with parameters. These too could be analyzed before hand and generated python-side and passed as a request instead of a "click on this link."

If you need to actually execute the javascript, you could run a PyV8 + Mechanize hybrid. I've been playing with this a bit and it seems pretty cool. PyV8 bridges Python with the V8 Javascript engine allowing you to create JS environments and execute arbitrary code. It does a great job going back and forth between the two languages.

I don't have any sample code, but one of these 3 solutions will work for you :) Good luck!

answered Sep 15, 2013 at 7:36

blakev

4,5392 gold badges39 silver badges56 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Constantly Confused Over a year ago

Do you have any resources on installing PyV8, I'm having some trouble.

blakev Over a year ago

@ConstantlyConfused at this point, I'd just consider using Selenium with a PhantomJS backend so you can run it headless. That workflow has come so far, and mechanize is pretty dead.

Collectives™ on Stack Overflow

Follow a javascript link with mechanize and python

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related