0

I am scraping HTML file using BeautifulSoup in python. I want to delete text after find a word.

Ex:

<div class="content">

<p> Page 1 </p>
<p> Page 2 </p>
<p> Page 3 </p>
<p> Page 4 </p>
<p> Page 5 </p>

</div>

I want to delete from Page 3.

<div class="content">

<p> Page 1 </p>
<p> Page 2 </p>
<p> Page 3 </p>

</div>

I have tried the following

p = soup.findAll('p')
if len(p) > 3 :
   d = p[3]
   while d:
       e = d.next
       d.extract()
       d = e

replacing d.extract() with del(d) is also not working. Please help.

2
  • Exactly how do you want to delete this? just that section? or everything down the rest of the page, including closing tags? Commented Apr 27, 2011 at 19:48
  • Rest of the html page, but I want to maintain the closing tags. Commented Apr 27, 2011 at 19:51

1 Answer 1

1

Try this:

p = soup.findAll('p')  
while len(p) > 3:
    last_p = p.pop()
    last_p.extract()
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.