How to extract data in HTML table with BeautifulSoup

Question

How can I extract a particular data (ie, 39.74% in this case) followed by "Proj. EPS Growth (F1)" in the following HTML example with BeautifulSoup? I'm completely new to Python. Thank you!

<div class="high_low_table" id="high_low_table">
</table>
<tbody>
<tr>
<th class="alpha" scope="row">Proj. EPS Growth (Q1) </th>
<td>19.56%</td>
</tr>
<tr>
<th class="alpha" scope="row">Proj. EPS Growth (F1) </th>
<td>39.74%</td>
</tr>
</tbody>
</table>
</div>

from bs4 import BeautifulSoup
import requests
page = requests.get(url)
soup = BeautifulSoup(page.content, 'lxml')
data = soup.find('div', class_="high_low_table").text

you want only one output or both output in table as output?? — Bhavya Parikh
– Bhavya Parikh, Commented Aug 20, 2021 at 13:32
Line 2 should be <table> and not </table>. See w3schools.com/html/html_tables.asp — David
– David, Commented Aug 20, 2021 at 13:33
Is this always the last row? Or are you identifying it by the nearest <th> text? — ggorlen
– ggorlen, Commented Aug 20, 2021 at 13:44

Bhavya Parikh · Accepted Answer · 2021-08-20 14:09:35Z

1

I have taken your data as HTML:

html="""<table>
<tbody>
<tr>
<th class="alpha" scope="row">Proj. EPS Growth (Q1) </th>
<td>19.56%</td>
</tr>
<tr>
<th class="alpha" scope="row">Proj. EPS Growth (F1) </th>
<td>39.74%</td>
</tr>
</tbody>
</table>
</div>"""

from bs4 import BeautifulSoup
import requests

soup = BeautifulSoup(html, 'lxml')

we can use re module to find specific data by passing text in find method

import re
data=soup.find("th",class_="alpha",text=re.compile("F1"))

after finding specigfic tag you can find td using find_next() method

prj=data.get_text(strip=True)
value=data.find_next("td").get_text()

print(prj,value,sep="\n")

Output:

Proj. EPS Growth (F1)
39.74%

edited Aug 20, 2021 at 14:09

answered Aug 20, 2021 at 13:55

Bhavya Parikh

3,3982 gold badges11 silver badges20 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

user2668284 · Accepted Answer · 2021-08-20 13:54:24Z

0

In this particular case you have two th elements with identical attributes. Therefore you need to inspect the text part of those elements to identify the one you're interested in. Try this:-

from bs4 import BeautifulSoup as BS

HTML = '''
<html>
<div class="high_low_table" id="high_low_table">
<table>
<tbody>
<tr>
<th class="alpha" scope="row">Proj. EPS Growth (Q1) </th>
<td>19.56%</td>
</tr>
<tr>
<th class="alpha" scope="row">Proj. EPS Growth (F1) </th>
<td>39.74%</td>
</tr>
</tbody>
</table>
</div>
'''

soup = BS(HTML, 'html.parser')
for th in soup.find_all('th', attrs={'class': 'alpha', 'scope': 'row'}):
    if 'Proj. EPS Growth (F1)' in th.text:
        print(th.find_next('td').text)

answered Aug 20, 2021 at 13:54

user2668284

Comments

Ram · Accepted Answer · 2021-08-20 13:56:57Z

This is how you do it for the given HTML Code.

Select the second row <tr> (because that is where the data you need is present) that is inside a div with id='high_low_table'.
```
trs = soup.find('div', attrs= {'id': 'high_low_table'}).find_all('tr')[1]
```
Prints the text present inside the <td> tags. i.e, 39.74%
```
print(trs.find('td').text)
```
Prints the text present inside <th>. i.e. Proj. EPS Growth (F1)
```
print(trs.find('th').text)
```

Here is the Complete Code

from bs4 import BeautifulSoup

s = '''<div class="high_low_table" id="high_low_table">
</table>
<tbody>
<tr>
<th class="alpha" scope="row">Proj. EPS Growth (Q1) </th>
<td>19.56%</td>
</tr>
<tr>
<th class="alpha" scope="row">Proj. EPS Growth (F1) </th>
<td>39.74%</td>
</tr>
</tbody>
</table>
</div>'''

soup = BeautifulSoup(s, 'lxml')
trs = soup.find('div', attrs= {'id': 'high_low_table'}).find_all('tr')[1]
print(trs.find('td').text)
print(trs.find('th').text)

39.74%
Proj. EPS Growth (F1)

Collectives™ on Stack Overflow

How to extract data in HTML table with BeautifulSoup

3 Answers 3

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related