Scrape Table Data from Website

Question

I am trying to scrape table data from a website using BeautifulSoup4 and Python then creating an Excel document with the results. So far, I have this:

import urllib2
from bs4 import BeautifulSoup

soup = BeautifulSoup(urllib2.urlopen('http://opl.tmhp.com/ProviderManager/SearchResults.aspx?TPI=&OfficeHrs=4&ProgType=STAR&UCCIndicator=No+Preference&Cnty=&NPI=&Srvs=6&Age=All&Gndr=B&SortBy=Distance&ZipCd=78552&SrvsOfrd=0&SpecCd=0&Name=&CntySrvd=0&Plan=H3&WvrProg=0&SubSpecCd=0&AcptPnt=Y&Rad=200&LangCd=99').read())

for row in soup('table', {'class' : 'spad'})[0].tbody('tr'):
    tds = row('td')
    print tds[0].string, tds[1].string

But it isn't working to display the data.

Any ideas?

I couldn't see a class 'spad' on that page - are you sure it's correct? — scdove
– scdove, Commented May 26, 2013 at 19:26

kirelagin · Accepted Answer · 2013-05-26 19:41:53Z

5

First of all the class is StandardResultsGrid, not spad.

Second, you don't need the tbody thing. Simply use:

for row in soup('table', {'class' : 'StandardResultsGrid'})[0]('tr'):

Also note, that since in the original page the row with header is included in tbody for some reason, you'll have to skip the first row, so

for row in soup('table', {'class' : 'StandardResultsGrid'})[0]('tr')[1:]

And note that some cells include tables in them, so you'll have to parse the contents of the tds carefully.

answered May 26, 2013 at 19:41

kirelagin

13.7k2 gold badges45 silver badges59 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Scrape Table Data from Website

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related