1

I'm working with the following HTML snippet from a page on Goodreads using Python 3.6.3:

<div class="quoteText">
      “Don't cry because it's over, smile because it happened.”
  <br/>  ―
    <a class="authorOrTitle" href="/author/show/61105.Dr_Seuss">Dr. Seuss</a>
</div>, <div class="quoteText">

I used BeautifulSoup to scrape the HTML and isolate just the "quoteText" class seen in the snippet above. Now, I want to save the quote and author name as separate strings. I was able to get the author name using

(quote_tag.find(class_="quoteText")).text

I'm not sure how to do the same for the quote. I'm guessing I need a way to remove the subclass from my output and tried using the extract method.

quote.extract(class_="authorOrTitle")

but I got an error saying extract got an unexpected keyword argument 'class_' Is there any other way to do what I'm trying to do?

This is my first time posting on here so I apologize if the post doesn't meet particular specificity/formatting/other standards.

1 Answer 1

1

PageElement.extract() removes a tag or string from the tree. It returns the tag or string that was extracted

from bs4 import BeautifulSoup
a='''<div class="quoteText">
      “Don't cry because it's over, smile because it happened.”
  <br/>  -
    <a class="authorOrTitle" href="/author/show/61105.Dr_Seuss">Dr. Seuss</a>
</div>, <div class="quoteText">'''
s=BeautifulSoup(a,'lxml')
s.find(class_="authorOrTitle").extract()
print(s.text)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.