BeautifulSoup parse unstructured html

Question

Trying to parse this html with BeautifulSoup:

<div class="container">
  <strong>Monday</strong> Some info here...<br /> and then some <br />
  <strong>Tuesday</strong> Some info here...<br />
  <strong>Wednesday</strong> Some info here...<br />
  ...
</div>

I wanna be able to get the data for Tuesday only: Tuesday Some info here...  But since there is no wrapper div, I am having difficulties to get this data only. Any suggestions?

har07 · Accepted Answer · 2015-06-27 14:21:11Z

3

How about this way :

from bs4 import BeautifulSoup

html = """<div class="container">
  <strong>Monday</strong> Some info here...<br /> and then some <br />
  <strong>Tuesday</strong> Some info here...<br />
  <strong>Wednesday</strong> Some info here...<br />
  ...
</div>"""
soup = BeautifulSoup(html)
result = soup.find('strong', text='Tuesday').findNextSibling(text=True)
print(result.decode('utf-8'))

output :

 Some info here...

update based on comment :

Basically, you can continue getting next sibling text of Tuesday, until next sibling element of the text is another  element or none.

from bs4 import BeautifulSoup

html = """<div class="container">
  <strong>Monday</strong> Some info here...<br /> and then some <br />
  <strong>Tuesday</strong> Some info here...<br /> and then some <br />
  <strong>Wednesday</strong> Some info here...<br />
  ...
</div>"""
soup = BeautifulSoup(html)
result = soup.find('strong', text='Tuesday').findNextSibling(text=True)
nextSibling = result.findNextSibling()
while nextSibling and nextSibling.name != 'strong':
    print(result.decode('utf-8'))
    result = nextSibling.findNextSibling(text=True)
    nextSibling = result.findNextSibling()

output :

 Some info here...
 and then some

edited Jun 27, 2015 at 14:21

answered Jun 27, 2015 at 11:58

har07

89.5k12 gold badges87 silver badges143 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

user1121487 Over a year ago

Yes, but it will only include html up to the first tag, I need everything from to next .

serk Over a year ago

user1121487 Your original question was what you got in the first answer "get the data for Tuesday only: Tuesday Some info here... ". If you wanted "everything from to next " you should have made that clear originally. @har07's original answer satisfied what you originally asked.

user1121487 Over a year ago

I think that was very clear from the structure in the example that I needed everything from strong to strong, which is everything for tuesday, since you cannot tell how many br's etc there will be. @serk

Collectives™ on Stack Overflow

BeautifulSoup parse unstructured html

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related