python web scraping code error

Question

I am trying to learn the basics of web scraping in python using beautiful soup. I came across code in a document. When I execute it there is an error. The code is:

import urllib2
from BeautifulSoup import BeautifulSoup

soup = BeautifulSoup(urllib2.urlopen('http://www.bcsfootball.org’).read())

for row in soup('table', {'class': 'mod-data’})[0].tbody('tr'):
  tds = row('td')
  print tds[0].string, tds[1].string

and the error is:

SyntaxError: Non-ASCII character '\xe2' in file ex.py on line 4, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details

please help me solve this, and explain the line

for row in soup('table', {'class': 'mod-data’})[0].tbody('tr'):

most of the sites are giving the sample code, not explaining how it came and what is the meaning. It's a bit confusing, the terms like class, tbody etc. It will be really helpful if you could suggest any site or ebooks or anything

Did you read the pep pointed to in the error message?

hd1
– hd1

2014-02-22 16:07:14 +00:00
Commented Feb 22, 2014 at 16:07 — hd1
– hd1, Commented Feb 22, 2014 at 16:07

Totem · Accepted Answer · 2014-02-22 17:11:53Z

3

You have a typo in this line:

soup = BeautifulSoup(urllib2.urlopen('http://www.bcsfootball.org’).read())

instead of a single quote after .org you have an apostrophe

It should be something like:

soup = BeautifulSoup(urllib2.urlopen("http://www.bcsfootball.org").read())

Also:

You have the same issue in the following line. After mod-data change the apostrophe to a quote

Instead of just soup('table', {'class': 'mod-data'})[0].tbody('tr') # syntax error

Try soup.find_all('table', {'class': 'mod-data'})[0].tbody('tr')

OR .findAll for older versions of BeautifulSoup..

You should be using one of soups methods here, like .find_all() which returns a list

Read the BeautifulSoup docs and get the latest version(4) of BeautifulSoup

The following code works for me:

import urllib2
from bs4 import BeautifulSoup # latest version bs4

soup = BeautifulSoup(urllib2.urlopen("http://www.bcsfootball.org").read())

for row in soup.find_all("table", {"class": "mod-data"})[0].tbody("tr"):
    tds = row("td")
    print tds[0].string, tds[1].string

Output:

1 Florida State
2 Auburn
3 Alabama
4 Michigan State
5 Stanford
6 Baylor
7 Ohio State
8 Missouri
9 South Carolina
10 Oregon
11 Oklahoma
12 Clemson
13 Oklahoma State
14 Arizona State
15 UCF
16 LSU
17 UCLA
18 Louisville
19 Wisconsin
20 Fresno State
21 Texas A&M;
22 Georgia
23 Northern Illinois
24 Duke
25 USC

If you are having problems using single-quotes on those lines, use double-quotes.

edited Feb 22, 2014 at 17:11

answered Feb 22, 2014 at 16:12

Totem

7,3795 gold badges43 silver badges67 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

Totem Over a year ago

this now works for me.. please let me know if it works for you

FathimaBeevi Over a year ago

got an error again.'none type object is not callable'

Totem Over a year ago

can you post you're full error/traceback at the bottom of your post pls? It's working fine for me

FathimaBeevi Over a year ago

Traceback (most recent call last): File "stack.py", line 6, in <module> for row in soup.find_all('table', {'class': 'mod-data'})[0].tbody('tr'): TypeError: 'NoneType' object is not callable

Totem Over a year ago

Have you checked your indentation?

|

John Dorian · Accepted Answer · 2014-02-22 16:10:49Z

1

Try changing your fourth line from:

soup = BeautifulSoup(urllib2.urlopen('http://www.bcsfootball.org’).read())

To:

soup = BeautifulSoup(urllib2.urlopen("http://www.bcsfootball.org").read())

It looks like your second single quote was different from the first, so changing to double quotes should alleviate that error.

The code you are asking about is reading from a table. In HTML each row of a table is denoted by the tag, which your program is searching for and then reading from. You are then printing the first and second column of the table you found.

answered Feb 22, 2014 at 16:10

John Dorian

1,9041 gold badge19 silver badges29 bronze badges

Comments

parul · Accepted Answer · 2015-05-27 10:37:13Z

0

Try changing your second line:

from bs4 import BeautifulSoup

answered May 27, 2015 at 10:37

parul

311 silver badge3 bronze badges

Collectives™ on Stack Overflow

python web scraping code error

3 Answers 3

6 Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

6 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related