1

First, please check the image below so I can better explain my question:

enter image description here

I am trying to take a user input to select one of the links below "Course Search By Term".... (ie. Winter 2015).

The HTML opened shows the part of the code for this webpage. I would like to grab all the href links in the element , which consists of five term links I want. I am following the instructions from this website (www.gregreda.com/2013/03/03/web-scraping-101-with-python/), but it doesn't explain this part. Here is some code I have been trying.

from bs4 import BeautifulSoup
from urllib2 import urlopen

BASE_URL = "http://classes.uoregon.edu/"

def get_category_links(section_url):

    html = urlopen(section_url).read()
    soup = BeautifulSoup(html, "lxml")
    pldefault = soup.find("td", "pldefault")
    ul_links = pldefault.find("ul")
    category_links = [BASE_URL + ul.a["href"] for i in ul_links.findAll("ul")]

    return category_links

Any help is appreciated! Thanks. Or if you would like to see the website, its classes.uoregon.edu/

1 Answer 1

1

I would keep it simple and locate all links containing 2015 in the text and term in href:

for link in soup.find_all("a",
                          href=lambda href: href and "term" in href,
                          text=lambda text: text and "2015" in text):
    print link["href"]

Prints:

/pls/prod/hwskdhnt.p_search?term=201402
/pls/prod/hwskdhnt.p_search?term=201403
/pls/prod/hwskdhnt.p_search?term=201404
/pls/prod/hwskdhnt.p_search?term=201406
/pls/prod/hwskdhnt.p_search?term=201407

If you want full URLs, use urlparse.urljoin() to join the links with a base url:

from urlparse import urljoin

...
for link in soup.find_all("a",
                          href=lambda href: href and "term" in href,
                          text=lambda text: text and "2015" in text):
    print urljoin(url, link["href"])

This would print:

http://classes.uoregon.edu/pls/prod/hwskdhnt.p_search?term=201402
http://classes.uoregon.edu/pls/prod/hwskdhnt.p_search?term=201403
http://classes.uoregon.edu/pls/prod/hwskdhnt.p_search?term=201404
http://classes.uoregon.edu/pls/prod/hwskdhnt.p_search?term=201406
http://classes.uoregon.edu/pls/prod/hwskdhnt.p_search?term=201407
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks for answering! I ended up figuring it out. Just read the BeautifulSoup documentation.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.