0

I have managed to get the script tag using BeautifulSoup.Then I turned into a json object. The information that I want is within data['x'] but it is stuck between b tags. Example :

<b>infoiwant</b><br>NA<br>infoinwant</br>columniwant: 123','<b>infoiwant</b><br>NA<br>columniwant: 123'</br>columniwant: 123

How would I go about getting the info out of these b elements

1
  • .find_all('b') Should be enough to get the b tags. br for the others. Commented Sep 10, 2020 at 21:19

2 Answers 2

2

Before converting to json, can you use the BeautifulSoup get_text() method? Maybe something like

soup.find('b').get_text()
Sign up to request clarification or add additional context in comments.

1 Comment

The <b> and <\\br> are from a script tag in the html. So using the find b method doesn't work for me
0

One method how to extract the data from <script> tag is using re module:

import re
from bs4 import BeautifulSoup


html_text = """
<script>
var data['x'] = '<b>infoiwant</b><br>NA<br>infoinwant</br>columniwant: 123';
</script>
"""

html_data = re.search(r"data\['x'\] = '(.*?)';", html_text).group(1)
soup = BeautifulSoup(html_data, "html.parser")

print(soup.find("b").get_text(strip=True))

Prints:

infoiwant

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.