Python mechanize, following link by url and what is the nr parameter?

Question

I'm sorry to have to ask something like this but python's mechanize documentation seems to really be lacking and I can't figure this out.. they only give one example that I can find for following a link:

response1 = br.follow_link(text_regex=r"cheese\s*shop", nr=1)

But I don't want to use a regex, I just want to follow a link based on its url, how would I do this.. also what is "nr" that is used sometimes for following links?

Thanks for any info

Just realized that I may have had an error in my headers which was preventing the links from working.. thanks to the people who helped I think your answers will work for me and I found another, more straightforward way to do it on another site so I will post that here too for reference once I'm done — Rick
– Rick, Commented Aug 25, 2010 at 20:51

unutbu · Accepted Answer · 2010-08-26 12:17:07Z

50

br.follow_link takes either a Link object or a keyword arg (such as nr=0).

br.links() lists all the links.

br.links(url_regex='...') lists all the links whose urls matches the regex.

br.links(text_regex='...') lists all the links whose link text matches the regex.

br.follow_link(nr=num) follows the numth link on the page, with counting starting at 0. It returns a response object (the same kind what br.open(...) returns)

br.find_link(url='...') returns the Link object whose url exactly equals the given url.

br.find_link, br.links, br.follow_link, br.click_link all accept the same keywords. Run help(br.find_link) to see documentation on those keywords.

Edit: If you have a target url that you wish to follow, you could do something like this:

import mechanize
br = mechanize.Browser()
response=br.open("http://www.example.com/")
target_url='http://www.rfc-editor.org/rfc/rfc2606.txt'
for link in br.links():
    print(link)
    # Link(base_url='http://www.example.com/', url='http://www.rfc-editor.org/rfc/rfc2606.txt', text='RFC 2606', tag='a', attrs=[('href', 'http://www.rfc-editor.org/rfc/rfc2606.txt')])
    print(link.url)
    # http://www.rfc-editor.org/rfc/rfc2606.txt
    if link.url == target_url:
        print('match found')
        # match found            
        break

br.follow_link(link)   # link still holds the last value it had in the loop
print(br.geturl())
# http://www.rfc-editor.org/rfc/rfc2606.txt

edited Aug 26, 2010 at 12:17

answered Aug 25, 2010 at 19:53

unutbu

886k197 gold badges1.9k silver badges1.7k bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

unutbu Over a year ago

@Rick: If you loop through br.links(), you can look at the string link.url to figure out if you want to follow it or not. No regex required.

Rick Over a year ago

thanks, I think I got it now... i don't know what it is but the versions of python mech that I have (latest ver) doesn't seem to have much in its doc file, not sure why.. anyways, thanks for the help and I think I can get it based on what you said, will try

Rick Over a year ago

I still can't figure out how to get a link to match, I am trying to use the regex as the full url but its not giving a match (when I do the for loop it never enters the loop implying it is not getting any matches)

unutbu Over a year ago

@Rick: Regex is tricky. Some characters in your url like .*+?()[] all have different meanings in the context of a regex pattern as opposed to plain string comparison. Since you have the full url, you can use == to compare the url against link.url. I've added some code to show what I mean.

Rick Over a year ago

thanks, I have a lot of regex experience I think the issue was that I had a problem in my headers, I appreciate your help and I found another way to do it without using regex so I will post that for reference once I test it

Rick · Accepted Answer · 2010-08-25 21:10:27Z

16

I found this way to do it, for reference for anyone who doesn't want to use regex:

r = br.open("http://www.somewebsite.com")
br.find_link(url='http://www.somewebsite.com/link1.html')
req = br.click_link(url='http://www.somewebsite.com/link1.html')
br.open(req)
print br.response().read()

Or, it will work by the link's text also:

r = br.open("http://www.somewebsite.com")
br.find_link(text='Click this link')
req = br.click_link(text='Click this link')
br.open(req)
print br.response().read()

answered Aug 25, 2010 at 21:10

Rick

17.1k36 gold badges115 silver badges163 bronze badges

1 Comment

unutbu Over a year ago

I like this solution a lot better than the one I suggested. (I think it even works without the calls to br.find_link). Please accept this one so it will bubble to the top.

jkerian · Accepted Answer · 2010-08-25 19:51:27Z

2

From looking at the code, I suspect you want

response1 = br.follow_link(link=LinkObjectToFollow)

nr is the same as documented under the find_link call.

EDIT: In my first cursory glance, I didn't realize "link" wasn't a simple link.

answered Aug 25, 2010 at 19:51

jkerian

17.2k3 gold badges49 silver badges59 bronze badges

2 Comments

jkerian Over a year ago

I found the 'nr' info in the code itself. _mechanize.py in the doctext for find_link... right around line 614

Rick Over a year ago

oh right I didn't even think that they would have a doc file there different from the online version, as I'm used to it also being online, thanks for the tip

Yuda Prawira · Accepted Answer · 2010-10-03 12:51:17Z

2

nr is used for where exactly link you follow. if the text or url you has been regex more than one. default is 0 so if you use default you will follow link first regex at all . for example the source :

<a href="link.html>Click this link</a>
<a href="link2.html>Click this link</a>

in this example we need to follow "Click this link" text but we choose link2.html to follow exactly

br.click_link(text='Click this link', nr=1)

by it you will get link2.html response

answered Oct 3, 2010 at 12:51

Yuda Prawira

12.5k10 gold badges49 silver badges55 bronze badges

Collectives™ on Stack Overflow

Python mechanize, following link by url and what is the nr parameter?

4 Answers 4

5 Comments

1 Comment

2 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

5 Comments

1 Comment

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related