How to extract and print the text inside all <td> tags in a table with python

Question

I would like to extract text (number) from inside all tags within a table in python. I am new to coding python so excuse the messiness in my code. Here is my code for this section.

r = requests.get(saurl)
soupsa = BeautifulSoup(r.text, 'html.parser')
cases_table = soupsa.find('table')
for state in cases_table.find_all('tbody'):
rows = state.find_all('tr')
for row in rows:
    numcases = row.find('class="numeric"')
    aunumcases = row.find('td class="numeric"')
    print(aunumcases)

The html table that i am trying to scrape looks like this.

<tbody>
      <tr>
        <th>
          Location
        </th>
        <th class="text--align-right">
          Confirmed cases*            </th>
      </tr>
      <tr>
        <td>
            <p>Australian Capital Territory</p>
        </td>
        <td class="numeric">
            <p><span>78</span></p>
        </td>
      </tr>
      <tr>
        <td>
          <p>New South Wales</p>
        </td>
        <td class="numeric">
          2,032            </td>
      </tr>
      <tr>
        <td>
          <p>Northern Territory</p>
        </td>
        <td class="numeric">
            14            </td>
      </tr>
      <tr>
        <td>
          <p>Queensland</p>
        </td>
        <td class="numeric">
          689            </td>
      </tr>
      <tr>
        <td>
          <p>South Australia</p>
        </td>
        <td class="numeric">
          305            </td>
      </tr>
      <tr>
        <td>
          <p>Tasmania</p>
        </td>
        <td class="numeric">
          65            </td>
      </tr>
      <tr>
        <td>
          <p>Victoria</p>
        </td>
        <td class="numeric">
          821            </td>
      </tr>
      <tr>
        <td>
          <p>Western Australia</p>
        </td>
        <td class="numeric">
          355            </td>
      </tr>
      <tr>
        <td>
          <p><strong>Total**</strong></p>
        </td>
        <td class="numeric">
          <strong>4,359</strong>
        </td>
      </tr>
    </tbody>

The problem is when i run the code and print 'aunumcases' it returns 'none'. Any help would be really apriciated!

And are you expecting a different result for aunumcases v numcases? If so, what? — QHarr
– QHarr, Commented Mar 31, 2020 at 4:51
@QHarr The link to the website is health.gov.au/news/health-alerts/… — jleo
– jleo, Commented Mar 31, 2020 at 23:40
@QHarr and yes i was expecting the same answer from aucases and numcases, they were just two different methods that i tried to get a result but neither worked! — jleo
– jleo, Commented Mar 31, 2020 at 23:41

QHarr · Accepted Answer · 2020-04-01 06:34:03Z

1

It's a static table so I would just use pandas

import pandas as pd

table = pd.read_html('https://www.health.gov.au/news/health-alerts/novel-coronavirus-2019-ncov-health-alert/coronavirus-covid-19-current-situation-and-case-numbers')[0]

answered Apr 1, 2020 at 6:34

QHarr

84.5k14 gold badges58 silver badges105 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

jleo Over a year ago

I am not sure what you mean, do i just print the table after that code?

QHarr Over a year ago

Yes, Or filter the table/subset for any specific values you want.

Collectives™ on Stack Overflow

How to extract and print the text inside all <td> tags in a table with python

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related