0

I would like to extract text (number) from inside all tags within a table in python. I am new to coding python so excuse the messiness in my code. Here is my code for this section.

r = requests.get(saurl)
soupsa = BeautifulSoup(r.text, 'html.parser')
cases_table = soupsa.find('table')
for state in cases_table.find_all('tbody'):
rows = state.find_all('tr')
for row in rows:
    numcases = row.find('class="numeric"')
    aunumcases = row.find('td class="numeric"')
    print(aunumcases)

The html table that i am trying to scrape looks like this.

<tbody>
      <tr>
        <th>
          Location
        </th>
        <th class="text--align-right">
          Confirmed cases*            </th>
      </tr>
      <tr>
        <td>
            <p>Australian Capital Territory</p>
        </td>
        <td class="numeric">
            <p><span>78</span></p>
        </td>
      </tr>
      <tr>
        <td>
          <p>New South Wales</p>
        </td>
        <td class="numeric">
          2,032            </td>
      </tr>
      <tr>
        <td>
          <p>Northern Territory</p>
        </td>
        <td class="numeric">
            14            </td>
      </tr>
      <tr>
        <td>
          <p>Queensland</p>
        </td>
        <td class="numeric">
          689            </td>
      </tr>
      <tr>
        <td>
          <p>South Australia</p>
        </td>
        <td class="numeric">
          305            </td>
      </tr>
      <tr>
        <td>
          <p>Tasmania</p>
        </td>
        <td class="numeric">
          65            </td>
      </tr>
      <tr>
        <td>
          <p>Victoria</p>
        </td>
        <td class="numeric">
          821            </td>
      </tr>
      <tr>
        <td>
          <p>Western Australia</p>
        </td>
        <td class="numeric">
          355            </td>
      </tr>
      <tr>
        <td>
          <p><strong>Total**</strong></p>
        </td>
        <td class="numeric">
          <strong>4,359</strong>
        </td>
      </tr>
    </tbody>

The problem is when i run the code and print 'aunumcases' it returns 'none'. Any help would be really apriciated!

4
  • Can you include the actual url please? Commented Mar 31, 2020 at 4:40
  • And are you expecting a different result for aunumcases v numcases? If so, what? Commented Mar 31, 2020 at 4:51
  • @QHarr The link to the website is health.gov.au/news/health-alerts/… Commented Mar 31, 2020 at 23:40
  • @QHarr and yes i was expecting the same answer from aucases and numcases, they were just two different methods that i tried to get a result but neither worked! Commented Mar 31, 2020 at 23:41

1 Answer 1

1

It's a static table so I would just use pandas

import pandas as pd

table = pd.read_html('https://www.health.gov.au/news/health-alerts/novel-coronavirus-2019-ncov-health-alert/coronavirus-covid-19-current-situation-and-case-numbers')[0]
Sign up to request clarification or add additional context in comments.

2 Comments

I am not sure what you mean, do i just print the table after that code?
Yes, Or filter the table/subset for any specific values you want.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.