Getting None type error while using for loop to web scrape

Question

I seem to get an error while I use a for loop in my web scraping technique.

Here is my code for the app.py file:

page_content = requests.get("http://books.toscrape.com/").content
parser = BookParser(page_content)

containers = parser.Content()
results = []

for container in containers:

    name = container.getName()
    link = container.getLink()
    price = container.getPrice()
    rating = container.getRating()

    results.append({'name': name,
                'link': link,
                'price': price,
                'rating': rating
                })  

print(results[4])

and this is the code for the function that is called:

class BookParser(object):
    RATINGS = {
        'One': 1,
        'Two': 2,
        'Three': 3,
        'Four': 4,
        'Five': 5
    }
    
    def __init__(self, page):
        self.soup = BeautifulSoup(page, 'html.parser')

    def Content(self):
        return self.soup.find_all("li",attrs={"class": 'col-xs-6'})

    def getName(self):
        return self.soup.find('h3').find('a')['title']

    def getLink(self):
        return self.soup.find('h3').find('a')['href']
    
    def getPrice(self):
        locator = BookLocator.PRICE
        price = self.soup.select_one(locator).string
        
        pattern = r"[0-9\.]*"
        validator = re.findall(pattern, price)

        return float(validator[1])

    def getRating(self):
        locator = BookLocator.STAR_RATING
        rating = self.soup.select_one(locator).attrs['class']

        rating_number = BookParser.RATINGS.get(rating[1])
        return rating_number

and finally, this is the error:

Traceback (most recent call last):
  File "c:\Users\Utkarsh Kumar\Documents\Projects\milestoneP4\app.py", line 13, in <module>
    name = container.getName()
TypeError: 'NoneType' object is not callable

I don't seem to understand why is the getName() function returning a None Type.

Any help will be highly appreciated as I am pretty new to web scraping

PS: Using it without the for loop just works fine

something like this:

name = parser.getName()
print(name)

parser is a BookParser object which is why when you call .getName() it works since you defined that yourself. parser.Contents returns a list of BeautifulSoup elements, which don't have a property .getName() — sin tribu
– sin tribu, Commented Aug 23, 2020 at 6:47
To save you time, I would consider rethinking your strategy. All of the functions in BookParser will return the same values every time. Since you are not modifying self.soup, parser.getName() will return A Light in the Attic every time, which is probably not what you want. In fairness, it's a pretty cool book though. — sin tribu
– sin tribu, Commented Aug 23, 2020 at 6:52
What might be a better alternative to this? I did search around the web and I found out that most of them do everything in one file and that's not what I want. I want seperate files to work with. — Utkarsh
– Utkarsh, Commented Aug 23, 2020 at 6:55
I was assuming you were trying to get every book off the page, but maybe I'm wrong. What are you trying to do? — sin tribu
– sin tribu, Commented Aug 23, 2020 at 6:58
Yeah, your assumption is right. I want to get all the books and store it as a dictionary in the results list. — Utkarsh
– Utkarsh, Commented Aug 23, 2020 at 6:59

abdusco · Accepted Answer · 2020-08-23 07:15:48Z

2

containers = parser.Content() gives you a list of BS4 elements, not a BookParser instance. You can verify this using print(type(containers)).

To continue using .getName(), you can create a new class called Book, move .getName and move all related methods to it and pass in a list item returned from .Content() method (i.e. li.col-xs-6) and then you can call book.getName()

Something like this should work:

class Book:
    def __init__(el):
        self.soup = el

    def getName(self):
        return self.soup.find('h3').find('a')['title']

    def getLink(self):
        ...
    
    def getPrice(self):
        ...

    def getRating(self):
        ...


def get_books(html: str) -> list:
    soup = BeautifulSoup(html, 'html.parser')
    return [Book(it) for it in soup.find_all("li",attrs={"class": 'col-xs-6'})]


for b in get_books(html):
    print(b.getName())

edited Aug 23, 2020 at 7:15

answered Aug 23, 2020 at 6:47

abdusco

11.3k3 gold badges38 silver badges50 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

sin tribu · Accepted Answer · 2020-08-23 07:13:52Z

Each book in the list is in these li elements:

<li class="col-xs-6 col-sm-4 col-md-3 col-lg-3">
    <article class="product_pod">
       
            <div class="image_container">               
                    
                    <a href="catalogue/a-light-in-the-attic_1000/index.html"><img src="media/cache/2c/da/2cdad67c44b002e7ead0cc35693c0e8b.jpg" alt="A Light in the Attic" class="thumbnail"></a>
                       
            </div>
       
                <p class="star-rating Three">
                    <i class="icon-star"></i>
                    <i class="icon-star"></i>
                    <i class="icon-star"></i>
                    <i class="icon-star"></i>
                    <i class="icon-star"></i>
                </p>
            <h3><a href="catalogue/a-light-in-the-attic_1000/index.html" title="A Light in the Attic">A Light in the ...</a></h3>
            <div class="product_price">
        <p class="price_color">£51.77</p>   
<p class="instock availability">
    <i class="icon-ok"></i>  
        In stock
</p>
    
    <form>
        <button type="submit" class="btn btn-primary btn-block" data-loading-text="Adding...">Add to basket</button>
    </form>            
            </div>    
    </article>
</li>

Sorry for the bad formatting but you get the point. Make a class that operates on on a single list element rather than the soup object which is your whole page. For example:

class BookParser:
    def __init__(self, book_item ):
        self.book_item = book_item
    def getName( self ):
        return self.book_item.find( path_to_name ).text

Then, you would first parse the page, find all the

book elements and make each of them an instance of BookParser.

soup = BeautifulSoup( url )
soup.find_all( path_to_book_elements )
books = []
for be in book_elements:
    books.append( BookParser( be ))
books[0].getName() # A light in the Attic
books[1].getName() # Tripping on Velvet

Collectives™ on Stack Overflow

Getting None type error while using for loop to web scrape

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related