1

I am trying to scrape table data from a website using BeautifulSoup4 and Python then creating an Excel document with the results. So far, I have this:

import urllib2
from bs4 import BeautifulSoup

soup = BeautifulSoup(urllib2.urlopen('http://opl.tmhp.com/ProviderManager/SearchResults.aspx?TPI=&OfficeHrs=4&ProgType=STAR&UCCIndicator=No+Preference&Cnty=&NPI=&Srvs=6&Age=All&Gndr=B&SortBy=Distance&ZipCd=78552&SrvsOfrd=0&SpecCd=0&Name=&CntySrvd=0&Plan=H3&WvrProg=0&SubSpecCd=0&AcptPnt=Y&Rad=200&LangCd=99').read())

for row in soup('table', {'class' : 'spad'})[0].tbody('tr'):
    tds = row('td')
    print tds[0].string, tds[1].string

But it isn't working to display the data.

Any ideas?

1
  • 2
    I couldn't see a class 'spad' on that page - are you sure it's correct? Commented May 26, 2013 at 19:26

1 Answer 1

5

First of all the class is StandardResultsGrid, not spad.

Second, you don't need the tbody thing. Simply use:

for row in soup('table', {'class' : 'StandardResultsGrid'})[0]('tr'):

Also note, that since in the original page the row with header is included in tbody for some reason, you'll have to skip the first row, so

for row in soup('table', {'class' : 'StandardResultsGrid'})[0]('tr')[1:]

And note that some cells include tables in them, so you'll have to parse the contents of the tds carefully.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.