Extract subclass from class using beautifulsoup

Question

I'm working with the following HTML snippet from a page on Goodreads using Python 3.6.3:

<div class="quoteText">
      “Don't cry because it's over, smile because it happened.”
  <br/>  ―
    <a class="authorOrTitle" href="/author/show/61105.Dr_Seuss">Dr. Seuss</a>
</div>, <div class="quoteText">

I used BeautifulSoup to scrape the HTML and isolate just the "quoteText" class seen in the snippet above. Now, I want to save the quote and author name as separate strings. I was able to get the author name using

(quote_tag.find(class_="quoteText")).text

I'm not sure how to do the same for the quote. I'm guessing I need a way to remove the subclass from my output and tried using the extract method.

quote.extract(class_="authorOrTitle")

but I got an error saying extract got an unexpected keyword argument 'class_' Is there any other way to do what I'm trying to do?

This is my first time posting on here so I apologize if the post doesn't meet particular specificity/formatting/other standards.

Smart Manoj · Accepted Answer · 2018-05-28 06:48:07Z

1

PageElement.extract() removes a tag or string from the tree. It returns the tag or string that was extracted

from bs4 import BeautifulSoup
a='''<div class="quoteText">
      “Don't cry because it's over, smile because it happened.”
  <br/>  -
    <a class="authorOrTitle" href="/author/show/61105.Dr_Seuss">Dr. Seuss</a>
</div>, <div class="quoteText">'''
s=BeautifulSoup(a,'lxml')
s.find(class_="authorOrTitle").extract()
print(s.text)

answered May 28, 2018 at 6:48

Smart Manoj

6,0736 gold badges45 silver badges66 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Extract subclass from class using beautifulsoup

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related