2

So I just started programming a few hours ago, working on an example from Nathan Yau's Visualize This. I'm using Python 3.3.4 and BeautifulSoup4 for my first data scrape exercise. Though the book seems to be using Python 2.x, I've managed to figure out the updated codes to complete my first data scraping exercise using historical data from wunderground.com:

>>> maxTemp = soup.findAll(attrs={"class":"nobr"})[5].span.string

>>> print(maxTemp)

The result, as it should be, is "37."

I included this bit of code in my first script and when I tried to run it from the command prompt, it starts but then I get the error:

"AttributeError: 'NoneType' object has no attribute 'string'"

As you can imagine, it's frustrating to see my code work fine in the Python IDLE GUI and not from the script. I've looked around for answers and tried different things, but am now definitively stuck. Any suggestions?

EDIT: Adding more code for my example. This is from the script that fails:

url = "http://www.wunderground.com/history/airport/KBOS/2013/" + str(m) + "/" + str(d) + "DailyHistory.html"
    page = urllib.request.urlopen(url)
    # Get daily maximum temperature from page
    soup = BeautifulSoup(page)
    # maxTemp = soup.body.nobr.b.string
    maxTemp = soup.findAll(attrs={"class":"nobr"})[5].span.string

Again, it fails when I run it from my terminal:

C:\Python33>python get-weather-data.py
Getting data for 201311
Traceback (most recent call last)
    File "get-weather-data.py", line 28, in (module)
      maxTemp = soup.findAll(attrs={"class":"nobr"})[5].span.string
AttributeError: 'NoneType' object has no attribute 'string'

Even though it works fine here in IDLE:

import urllib.request
page = urllib.request.urlopen("http://www.wunderground.com/history/airport/KBOS/2013/1/1/DailyHistory.html")
from bs4 import BeautifulSoup
soup = BeautifulSoup(page)
maxTemp = soup.findAll(attrs={"class":"nobr"})[5].span.string
print(maxTemp)
2
  • 1
    You're trying to get .string from soup.findAll(attrs={"class":"nobr"})[5].span, but that latter is set to None, not to whatever you're expecting. Why this is the case, is difficult to say. You should post a full example, instead of just one line. Commented Feb 14, 2014 at 1:37
  • I added some more of the example, but I'm not sure how much it might help. My assumption was that "string" referred to whatever string of characters might exist within the <spans> (the first span being the sixth instance of a nobr class). The IDLE GUI would return the value seen in on the site, "37", which indicates to me that it's something, not "None." But are you suggesting I have to tell the script that "string" means something, like "by string, Python, I mean whatever characters are there"? Commented Feb 14, 2014 at 2:11

2 Answers 2

1

I had the same issue. It's an amazing book, but it seems like the example was based on the previous version of Beautiful Soup. It'll work if you just drop the ".span" part. Like so:

maxTemp = soup.findAll(attrs={"class":"nobr"})[5].string
Sign up to request clarification or add additional context in comments.

1 Comment

Pleasure man. If you could just mark it as solved with the check mark on the left, that would be great :)
0

You might try reading the web page, dumping the contents to a file, and comparing the file contents for your IDLE vs. script scenario. It may be the page contents are actually different -- in which the difference you're seeing could be correct. Doing this would help narrow down the possible causes.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.