0

I am trying to scrape some data off of the tables in https://www.ptv.vic.gov.au/footer/data-and-reporting/network-performance/daily-performance/ Specifically, I want to scrape the 'Metropolitan tram' table. However, the html elements aren't structured well and I am unsure how to identify the table by name and scrape the content.

This is what I have tried:

import requests
from bs4 import BeautifulSoup

URL = "https://www.ptv.vic.gov.au/footer/data-and-reporting/network-performance/daily-performance/"
page = requests.get(URL)

soup = BeautifulSoup(page.content, "html.parser")


tables = soup.find_all("div", class_="mceTmpl table__wrapper")
for table in tables:
    print("NEXT-------------------------------------------")
    print(table, end="\n"*2)

1 Answer 1

2

May use pandas.read_html() in case of scraping tables, what is best practice and uses BeautifulSoup under the hood and select your table from list by index.

Alternative use css selectors :

soup.select('h3:has(a[name="metrotram"]) + div > div:first-of-type tr')

Example

import pandas as pd
import requests
from bs4 import BeautifulSoup
pd.read_html(
    requests.get(
        'https://www.ptv.vic.gov.au/footer/data-and-reporting/network-performance/daily-performance/', 
        headers={'user-agent':'some agent'}
    ).text,
    header=0
)[1]

Output

Unnamed: 0 % timetable delivered % services on-time at timing points
0 Sunday, 5 February 2023 99.4% 83.3%
1 Saturday, 4 February 2023 99.4% 81.8%
2 Friday, 3 February 2023 98.4% 79.7%
3 Thursday, 2 February 2023 97.9% 72.8%
4 Wednesday, 1 February 2023 98.9% 79.1%
5 Tuesday, 31 January 2023 99.0% 81.4%
6 Monday, 30 January 2023 99.3% 90.2%
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.