1

I am trying to scrape the numbers from HTML data so that I can get the sum of them. However, I am running into the above error when I try to run it. It is referring to the "data = " line. What is this error referring to in this line of code? Have I set the "for" loop up correctly? Thank you for your thoughts.

import urllib
from bs4 import BeautifulSoup

url = "http://python-data.dr-chuck.net/comments_42.html"
html = urllib.urlopen(url).read()

soup = BeautifulSoup(html, "html.parser")
tags = soup('span')
data = soup.findall("span", {"Comments":"Comments"})
numbers = [d.text for d in data]

summation = 0
for tag in tags:
    print tags
    y= tag.finall("span").text      
    summation = summation + int(y)                  
print summation

This is what the HTML data looks like:

<tr><td>Modu</td><td><span class="comments">90</span></td></tr>
<tr><td>Kenzie</td><td><span class="comments">88</span></td></tr>
<tr><td>Hubert</td><td><span class="comments">87</span></td></tr>

1 Answer 1

2

First of all, there is no findall() method in BeautifulSoup - there is find_all(). Also, you are basically searching for elements having Comments attribute that has a Comments value:

soup.findall("span", {"Comments":"Comments"})  

And, this is Python, you can sum up much easier with a built-in sum().

Fixed version:

data = soup.find_all("span", {"class": "comments"})
print sum(int(d.text) for d in data)  # prints 2482
Sign up to request clarification or add additional context in comments.

1 Comment

You are right; I meant to have that underscore find all method and use the built-in sum function. I guess part of me is used to using "for" loops often. Thank you for your feedback.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.