How do I grab all the links within an element in HTML using python?

Question

First, please check the image below so I can better explain my question:

enter image description here

I am trying to take a user input to select one of the links below "Course Search By Term".... (ie. Winter 2015).

The HTML opened shows the part of the code for this webpage. I would like to grab all the href links in the element , which consists of five term links I want. I am following the instructions from this website (www.gregreda.com/2013/03/03/web-scraping-101-with-python/), but it doesn't explain this part. Here is some code I have been trying.

from bs4 import BeautifulSoup
from urllib2 import urlopen

BASE_URL = "http://classes.uoregon.edu/"

def get_category_links(section_url):

    html = urlopen(section_url).read()
    soup = BeautifulSoup(html, "lxml")
    pldefault = soup.find("td", "pldefault")
    ul_links = pldefault.find("ul")
    category_links = [BASE_URL + ul.a["href"] for i in ul_links.findAll("ul")]

    return category_links

Any help is appreciated! Thanks. Or if you would like to see the website, its classes.uoregon.edu/

alecxe · Accepted Answer · 2015-03-17 03:04:02Z

1

I would keep it simple and locate all links containing 2015 in the text and term in href:

for link in soup.find_all("a",
                          href=lambda href: href and "term" in href,
                          text=lambda text: text and "2015" in text):
    print link["href"]

Prints:

/pls/prod/hwskdhnt.p_search?term=201402
/pls/prod/hwskdhnt.p_search?term=201403
/pls/prod/hwskdhnt.p_search?term=201404
/pls/prod/hwskdhnt.p_search?term=201406
/pls/prod/hwskdhnt.p_search?term=201407

If you want full URLs, use urlparse.urljoin() to join the links with a base url:

from urlparse import urljoin

...
for link in soup.find_all("a",
                          href=lambda href: href and "term" in href,
                          text=lambda text: text and "2015" in text):
    print urljoin(url, link["href"])

This would print:

http://classes.uoregon.edu/pls/prod/hwskdhnt.p_search?term=201402
http://classes.uoregon.edu/pls/prod/hwskdhnt.p_search?term=201403
http://classes.uoregon.edu/pls/prod/hwskdhnt.p_search?term=201404
http://classes.uoregon.edu/pls/prod/hwskdhnt.p_search?term=201406
http://classes.uoregon.edu/pls/prod/hwskdhnt.p_search?term=201407

answered Mar 17, 2015 at 3:04

alecxe

476k127 gold badges1.1k silver badges1.2k bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Josh Over a year ago

Thanks for answering! I ended up figuring it out. Just read the BeautifulSoup documentation.

Collectives™ on Stack Overflow

How do I grab all the links within an element in HTML using python?

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related