2

I want to extract excerpts of data like company name and address from a website using BeautifulSoup. I am getting, however, the following failure:

Calgary's Notary Public 
Traceback (most recent call last):
  File "test.py", line 16, in <module>
    print item.find_all(class_='jsMapBubbleAddress').text
AttributeError: 'ResultSet' object has no attribute 'text'

The HTML code snippet is here. I want to extract all the text information and convert into a CSV file. Please any one help me.

<div class="listing__right article hasIcon">
   <h3 class="listing__name jsMapBubbleName" itemprop="name"><a data-analytics='{"lk_listing_id":"100971374","lk_non-ad-rollup":"0","lk_page_num":"1","lk_pos":"in_listing","lk_proximity":"14.5","lk_directory_heading":[{"085100":[{"00910600":"1"},{"00911000":"1"}]}],"lk_geo_tier":"in","lk_area":"left_1","lk_relevancy":"1","lk_name":"busname","lk_pos_num":"1","lk_se_id":"e292d1d2-f130-463d-8f0c-7dd66800dead_Tm90YXJ5_Q2FsZ2FyeSwgQUI_56","lk_ev":"link","lk_product":"l2"}' href="/bus/Alberta/Calgary/Calgary-s-Notary-Public/100971374.html?what=Notary&amp;where=Calgary%2C+AB&amp;useContext=true" title="See detailed information for Calgary's Notary Public">Calgary's Notary Public</a> </h3>
   <div class="listing__address address mainLocal">
      <em class="itemCounter">1</em>
      <span class="listing__address--full" itemprop="address" itemscope="" itemtype="http://schema.org/PostalAddress">
      <span class="jsMapBubbleAddress" itemprop="streetAddress">340-600 Crowfoot Cres NW</span>, <span class="jsMapBubbleAddress" itemprop="addressLocality">Calgary</span>, <span class="jsMapBubbleAddress" itemprop="addressRegion">AB</span> <span class="jsMapBubbleAddress" itemprop="postalCode">T3G 0B4</span></span>
      <a class="listing__direction" data-analytics='{"lk_listing_id":"100971374","lk_non-ad-rollup":"0","lk_page_num":"1","lk_pos":"in_listing","lk_proximity":"14.5","lk_directory_heading":[{"085100":[{"00910600":"1"},{"00911000":"1"}]}],"lk_geo_tier":"in","lk_area":"left_1a","lk_relevancy":"1","lk_name":"directions-step1","lk_pos_num":"1","lk_se_id":"e292d1d2-f130-463d-8f0c-7dd66800dead_Tm90YXJ5_Q2FsZ2FyeSwgQUI_56","lk_ev":"link","lk_product":"l2"}' href="/merchant/directions/100971374?what=Notary&amp;where=Calgary%2C+AB&amp;useContext=true" rel="nofollow" title="Get direction to Calgary's Notary Public">Get directions »</a>
   </div>
   <div class="listing__details">
      <p class="listing__details__teaser" itemprop="description">We  offer you a convenient, quick and affordable solution for your Notary Public or Commissioner for Oaths in Calgary needs.</p>
   </div>
   <div class="listing__ratings--root">
      <div class="listing__ratings ratingWarp" itemprop="aggregateRating" itemscope="" itemtype="http://schema.org/AggregateRating">
         <meta content="5" itemprop="ratingValue"/>
         <meta content="1" itemprop="ratingCount"/>
         <span class="ypStars" data-analytics-group="stars" data-clicksent="false" data-rating="rating5" title="Ratings: 5 out of 5 stars">
         <span class="star1" data-analytics-name="stars" data-label="Optional : Why did you hate it?" title="I hated it"></span>
         <span class="star2" data-analytics-name="stars" data-label="Optional : Why didn't you like it?" title="I didn't like it"></span>
         <span class="star3" data-analytics-name="stars" data-label="Optional : Why did you like it?" title="I liked it"></span>
         <span class="star4" data-analytics-name="stars" data-label="Optional : Why did you really like it?" title="I really liked it"></span>
         <span class="star5" data-analytics-name="stars" data-label="Optional : Why did you love it?" title="I loved it"></span>
         </span><a class="listing__ratings__count" data-analytics='{"lk_listing_id":"100971374","lk_non-ad-rollup":"0","lk_page_num":"1","lk_pos":"in_listing","lk_proximity":"14.5","lk_directory_heading":[{"085100":[{"00910600":"1"},{"00911000":"1"}]}],"lk_geo_tier":"in","lk_area":"left_1","lk_relevancy":"1","lk_name":"read_yp_reviews","lk_pos_num":"1","lk_se_id":"e292d1d2-f130-463d-8f0c-7dd66800dead_Tm90YXJ5_Q2FsZ2FyeSwgQUI_56","lk_ev":"link","lk_product":"l2"}' href="/bus/Alberta/Calgary/Calgary-s-Notary-Public/100971374.html?what=Notary&amp;where=Calgary%2C+AB&amp;useContext=true#ypgReviewsHeader" rel="nofollow" title="1 of Review for Calgary's Notary Public">1<span class="hidden-phone"> YP review</span></a>
      </div>
   </div>
   <div class="listing__details detailsWrap">
      <ul>
         <li><a href="/search/si/1/Notaries/Calgary%2C+AB" title="Notaries">Notaries</a>
            ,
         </li>
         <li><a href="/search/si/1/Notaries+Public/Calgary%2C+AB" title="Notaries Public">Notaries Public</a></li>
      </ul>
   </div>
</div>

There are many divs with listing__right article hasIcon. I am using for loop to extract the information.

The python code I have written so far is.

import requests
from bs4 import BeautifulSoup

url = 'http://www.yellowpages.ca/search/si-rat/1/Notary/Calgary%2C+AB'
response = requests.get(url)
content = response.content

soup = BeautifulSoup(content)
g_data=soup.find_all('div', attrs={'class': 'listing__right article  hasIcon'})

for item in g_data:
    print item.find('h3').text
    #print item.contents[2].find_all('em', attrs={'class': 'itemCounter'})[1].text
    print item.find_all(class_='jsMapBubbleAddress').text
4
  • find_all returns a list, and lists in Python do not have a text property or attribute. Try iterating over that list returned on the last line of your code. Commented Mar 13, 2016 at 9:01
  • I want only first matching element Commented Mar 13, 2016 at 10:12
  • print item.find_all(class_='jsMapBubbleAddress')[0].text Commented Mar 13, 2016 at 10:12
  • You didn't mention that in your question. Commented Mar 13, 2016 at 10:30

1 Answer 1

2

find_all returns a list which has no 'text' attribute so you are getting an error, not sure what output you are looking for, but this code seems to work ok:

import requests
from bs4 import BeautifulSoup

url = 'http://www.yellowpages.ca/search/si-rat/1/Notary/Calgary%2C+AB'
response = requests.get(url)
content = response.content

soup = BeautifulSoup(content,"lxml")
g_data=soup.find_all('div', attrs={'class': 'listing__right article  hasIcon'})

for item in g_data:
    print item.find('h3').text
    #print item.contents[2].find_all('em', attrs={'class': 'itemCounter'})[1].text
    items = item.find_all(class_='jsMapBubbleAddress')
    for item in items:
        print item.text
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.