Unable extract text from html page in python

Question

I am very new to web scraping. I read about BeautifulSoup and tried to use it. But I am not able to extract text with given class name "company-desc-and-sort-container". I am not even able to extract the title from html page. This is the code which I tried:

from BeautifulSoup import BeautifulSoup
import requests

url= 'http://fortune.com/best-companies/'    
r = requests.get(url)

soup = BeautifulSoup(r.text)

#print soup.prettify()[0:1000]
print soup.find_all("title")

letters = soup.find_all("div", class_="company-desc-and-sort-container")

I am getting the following error:

 print soup.find_all("title")
TypeError: 'NoneType' object is not callable

what's your beautifulsoup version?

eLRuLL
– eLRuLL

2016-12-20 14:22:09 +00:00
Commented Dec 20, 2016 at 14:22 — eLRuLL
– eLRuLL, Commented Dec 20, 2016 at 14:22

alecxe · Accepted Answer · 2016-12-20 14:22:46Z

1

You are using BeautifulSoup version 3, which is not only maintained anymore, but also does not have the find_all() method. And, since the dot notation is used as a shortcut to find(), BeautifulSoup tries to find element with "find_all" tag name which results into None. Then, it would execute None("title") which results into:

TypeError: 'NoneType' object is not callable

Upgrade to BeautifulSoup version 4, replace:

from BeautifulSoup import BeautifulSoup

with:

from bs4 import BeautifulSoup

Make sure to have beautifulsoup4 package installed:

pip install --upgrade beautifulsoup4

answered Dec 20, 2016 at 14:22

alecxe

476k127 gold badges1.1k silver badges1.2k bronze badges

Sign up to request clarification or add additional context in comments.

Comments

fat fantasma · Accepted Answer · 2016-12-20 14:31:36Z

0

soup.find_all("title")

Is not finding a title tag and returning "none". Also the "find_all" method will return a list if it does find something and you will get a different error. You can't print a list. Use just the "find" method. That will do d the first title tag.

Then does the html page even have a title tag? Search, and only print if not none.

answered Dec 20, 2016 at 14:31

fat fantasma

7,64316 gold badges50 silver badges69 bronze badges

Collectives™ on Stack Overflow

Unable extract text from html page in python

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related