0

Trying to parse this html with BeautifulSoup:

<div class="container">
  <strong>Monday</strong> Some info here...<br /> and then some <br />
  <strong>Tuesday</strong> Some info here...<br />
  <strong>Wednesday</strong> Some info here...<br />
  ...
</div>

I wanna be able to get the data for Tuesday only: <strong>Tuesday</strong> Some info here...<br /> But since there is no wrapper div, I am having difficulties to get this data only. Any suggestions?

1 Answer 1

3

How about this way :

from bs4 import BeautifulSoup

html = """<div class="container">
  <strong>Monday</strong> Some info here...<br /> and then some <br />
  <strong>Tuesday</strong> Some info here...<br />
  <strong>Wednesday</strong> Some info here...<br />
  ...
</div>"""
soup = BeautifulSoup(html)
result = soup.find('strong', text='Tuesday').findNextSibling(text=True)
print(result.decode('utf-8'))

output :

 Some info here...

update based on comment :

Basically, you can continue getting next sibling text of <strong>Tuesday</strong>, until next sibling element of the text is another <strong> element or none.

from bs4 import BeautifulSoup

html = """<div class="container">
  <strong>Monday</strong> Some info here...<br /> and then some <br />
  <strong>Tuesday</strong> Some info here...<br /> and then some <br />
  <strong>Wednesday</strong> Some info here...<br />
  ...
</div>"""
soup = BeautifulSoup(html)
result = soup.find('strong', text='Tuesday').findNextSibling(text=True)
nextSibling = result.findNextSibling()
while nextSibling and nextSibling.name != 'strong':
    print(result.decode('utf-8'))
    result = nextSibling.findNextSibling(text=True)
    nextSibling = result.findNextSibling()

output :

 Some info here...
 and then some 
Sign up to request clarification or add additional context in comments.

3 Comments

Yes, but it will only include html up to the first <br /> tag, I need everything from <strong> to next <strong>.
user1121487 Your original question was what you got in the first answer "get the data for Tuesday only: <strong>Tuesday</strong> Some info here...<br />". If you wanted "everything from <strong> to next <strong>" you should have made that clear originally. @har07's original answer satisfied what you originally asked.
I think that was very clear from the structure in the example that I needed everything from strong to strong, which is everything for tuesday, since you cannot tell how many br's etc there will be. @serk

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.