0

How can I extract a particular data (ie, 39.74% in this case) followed by "Proj. EPS Growth (F1)" in the following HTML example with BeautifulSoup? I'm completely new to Python. Thank you!

<div class="high_low_table" id="high_low_table">
</table>
<tbody>
<tr>
<th class="alpha" scope="row">Proj. EPS Growth (Q1) </th>
<td>19.56%</td>
</tr>
<tr>
<th class="alpha" scope="row">Proj. EPS Growth (F1) </th>
<td>39.74%</td>
</tr>
</tbody>
</table>
</div>

from bs4 import BeautifulSoup
import requests
page = requests.get(url)
soup = BeautifulSoup(page.content, 'lxml')
data = soup.find('div', class_="high_low_table").text

6
  • you want only one output or both output in table as output?? Commented Aug 20, 2021 at 13:32
  • 2
    Line 2 should be <table> and not </table>. See w3schools.com/html/html_tables.asp Commented Aug 20, 2021 at 13:33
  • I want only one output. Commented Aug 20, 2021 at 13:41
  • Is this always the last row? Or are you identifying it by the nearest <th> text? Commented Aug 20, 2021 at 13:44
  • 2
    It would be better if you share the URL. Commented Aug 20, 2021 at 13:46

3 Answers 3

1

I have taken your data as HTML:

html="""<table>
<tbody>
<tr>
<th class="alpha" scope="row">Proj. EPS Growth (Q1) </th>
<td>19.56%</td>
</tr>
<tr>
<th class="alpha" scope="row">Proj. EPS Growth (F1) </th>
<td>39.74%</td>
</tr>
</tbody>
</table>
</div>"""

from bs4 import BeautifulSoup
import requests

soup = BeautifulSoup(html, 'lxml')

we can use re module to find specific data by passing text in find method

import re
data=soup.find("th",class_="alpha",text=re.compile("F1"))

after finding specigfic tag you can find td using find_next() method

prj=data.get_text(strip=True)
value=data.find_next("td").get_text()

print(prj,value,sep="\n")

Output:

Proj. EPS Growth (F1)
39.74%
Sign up to request clarification or add additional context in comments.

Comments

0

In this particular case you have two th elements with identical attributes. Therefore you need to inspect the text part of those elements to identify the one you're interested in. Try this:-

from bs4 import BeautifulSoup as BS

HTML = '''
<html>
<div class="high_low_table" id="high_low_table">
<table>
<tbody>
<tr>
<th class="alpha" scope="row">Proj. EPS Growth (Q1) </th>
<td>19.56%</td>
</tr>
<tr>
<th class="alpha" scope="row">Proj. EPS Growth (F1) </th>
<td>39.74%</td>
</tr>
</tbody>
</table>
</div>
'''

soup = BS(HTML, 'html.parser')
for th in soup.find_all('th', attrs={'class': 'alpha', 'scope': 'row'}):
    if 'Proj. EPS Growth (F1)' in th.text:
        print(th.find_next('td').text)

Comments

0

This is how you do it for the given HTML Code.

  • Select the second row <tr> (because that is where the data you need is present) that is inside a div with id='high_low_table'.

    trs = soup.find('div', attrs= {'id': 'high_low_table'}).find_all('tr')[1]
    
  • Prints the text present inside the <td> tags. i.e, 39.74%

    print(trs.find('td').text)
    
  • Prints the text present inside <th>. i.e. Proj. EPS Growth (F1)

    print(trs.find('th').text)
    

Here is the Complete Code

from bs4 import BeautifulSoup

s = '''<div class="high_low_table" id="high_low_table">
</table>
<tbody>
<tr>
<th class="alpha" scope="row">Proj. EPS Growth (Q1) </th>
<td>19.56%</td>
</tr>
<tr>
<th class="alpha" scope="row">Proj. EPS Growth (F1) </th>
<td>39.74%</td>
</tr>
</tbody>
</table>
</div>'''

soup = BeautifulSoup(s, 'lxml')
trs = soup.find('div', attrs= {'id': 'high_low_table'}).find_all('tr')[1]
print(trs.find('td').text)
print(trs.find('th').text)
39.74%
Proj. EPS Growth (F1)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.