0

I seem to get an error while I use a for loop in my web scraping technique.

Here is my code for the app.py file:

page_content = requests.get("http://books.toscrape.com/").content
parser = BookParser(page_content)

containers = parser.Content()
results = []

for container in containers:

    name = container.getName()
    link = container.getLink()
    price = container.getPrice()
    rating = container.getRating()

    results.append({'name': name,
                'link': link,
                'price': price,
                'rating': rating
                })  

print(results[4])

and this is the code for the function that is called:

class BookParser(object):
    RATINGS = {
        'One': 1,
        'Two': 2,
        'Three': 3,
        'Four': 4,
        'Five': 5
    }
    
    def __init__(self, page):
        self.soup = BeautifulSoup(page, 'html.parser')

    def Content(self):
        return self.soup.find_all("li",attrs={"class": 'col-xs-6'})

    def getName(self):
        return self.soup.find('h3').find('a')['title']

    def getLink(self):
        return self.soup.find('h3').find('a')['href']
    
    def getPrice(self):
        locator = BookLocator.PRICE
        price = self.soup.select_one(locator).string
        
        pattern = r"[0-9\.]*"
        validator = re.findall(pattern, price)

        return float(validator[1])

    def getRating(self):
        locator = BookLocator.STAR_RATING
        rating = self.soup.select_one(locator).attrs['class']

        rating_number = BookParser.RATINGS.get(rating[1])
        return rating_number

and finally, this is the error:

Traceback (most recent call last):
  File "c:\Users\Utkarsh Kumar\Documents\Projects\milestoneP4\app.py", line 13, in <module>
    name = container.getName()
TypeError: 'NoneType' object is not callable

I don't seem to understand why is the getName() function returning a None Type.

Any help will be highly appreciated as I am pretty new to web scraping

PS: Using it without the for loop just works fine

something like this:

name = parser.getName()
print(name)
5
  • parser is a BookParser object which is why when you call .getName() it works since you defined that yourself. parser.Contents returns a list of BeautifulSoup elements, which don't have a property .getName() Commented Aug 23, 2020 at 6:47
  • 1
    To save you time, I would consider rethinking your strategy. All of the functions in BookParser will return the same values every time. Since you are not modifying self.soup, parser.getName() will return A Light in the Attic every time, which is probably not what you want. In fairness, it's a pretty cool book though. Commented Aug 23, 2020 at 6:52
  • What might be a better alternative to this? I did search around the web and I found out that most of them do everything in one file and that's not what I want. I want seperate files to work with. Commented Aug 23, 2020 at 6:55
  • I was assuming you were trying to get every book off the page, but maybe I'm wrong. What are you trying to do? Commented Aug 23, 2020 at 6:58
  • Yeah, your assumption is right. I want to get all the books and store it as a dictionary in the results list. Commented Aug 23, 2020 at 6:59

2 Answers 2

2

containers = parser.Content() gives you a list of BS4 elements, not a BookParser instance. You can verify this using print(type(containers)).

To continue using .getName(), you can create a new class called Book, move .getName and move all related methods to it and pass in a list item returned from .Content() method (i.e. li.col-xs-6) and then you can call book.getName()

Something like this should work:

class Book:
    def __init__(el):
        self.soup = el

    def getName(self):
        return self.soup.find('h3').find('a')['title']

    def getLink(self):
        ...
    
    def getPrice(self):
        ...

    def getRating(self):
        ...


def get_books(html: str) -> list:
    soup = BeautifulSoup(html, 'html.parser')
    return [Book(it) for it in soup.find_all("li",attrs={"class": 'col-xs-6'})]


for b in get_books(html):
    print(b.getName())
Sign up to request clarification or add additional context in comments.

Comments

1

Each book in the list is in these li elements:

<li class="col-xs-6 col-sm-4 col-md-3 col-lg-3">
    <article class="product_pod">
       
            <div class="image_container">               
                    
                    <a href="catalogue/a-light-in-the-attic_1000/index.html"><img src="media/cache/2c/da/2cdad67c44b002e7ead0cc35693c0e8b.jpg" alt="A Light in the Attic" class="thumbnail"></a>
                       
            </div>
       
                <p class="star-rating Three">
                    <i class="icon-star"></i>
                    <i class="icon-star"></i>
                    <i class="icon-star"></i>
                    <i class="icon-star"></i>
                    <i class="icon-star"></i>
                </p>
            <h3><a href="catalogue/a-light-in-the-attic_1000/index.html" title="A Light in the Attic">A Light in the ...</a></h3>
            <div class="product_price">
        <p class="price_color">£51.77</p>   
<p class="instock availability">
    <i class="icon-ok"></i>  
        In stock
</p>
    
    <form>
        <button type="submit" class="btn btn-primary btn-block" data-loading-text="Adding...">Add to basket</button>
    </form>            
            </div>    
    </article>
</li>

Sorry for the bad formatting but you get the point. Make a class that operates on on a single list element rather than the soup object which is your whole page. For example:

class BookParser:
    def __init__(self, book_item ):
        self.book_item = book_item
    def getName( self ):
        return self.book_item.find( path_to_name ).text 

Then, you would first parse the page, find all the

  • book elements and make each of them an instance of BookParser.

    soup = BeautifulSoup( url )
    soup.find_all( path_to_book_elements )
    books = []
    for be in book_elements:
        books.append( BookParser( be ))
    books[0].getName() # A light in the Attic
    books[1].getName() # Tripping on Velvet
    
    
    
  • Comments

    Your Answer

    By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

    Start asking to get answers

    Find the answer to your question by asking.

    Ask question

    Explore related questions

    See similar questions with these tags.