Python novice here. I have been learning how to scrape from various baseball sites (Fangraphs, Statcast, Rotowire). I have had success with a few different methods, but the Park Factors table on Statcast is giving me issues. I have tried using Selenium and I have tried saving a local html copy of the site on my computer to practice on without repeatedly sending requests to the Statcast server. The below script does scrape a table on the URL, but it’s the first I think, just a game score.
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import pandas as pd
from io import StringIO
# Set your user agent information
headers = {
"User-Agent": "FirstName LastName <[email protected]>"
}
# Set the URL of the webpage containing the park factors
url = "https://baseballsavant.mlb.com/leaderboard/statcast-park-factors?type=year&year=2024&
batSide=L&stat=index_wOBA&condition=All&rolling="
# Initialize the WebDriver (e.g., for Firefox)
driver = webdriver.Firefox()
# Navigate to the URL
driver.get(url)
# Wait for the table to be loaded (adjust the timeout as needed)
WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.TAG_NAME, "table")))
# Extract the table data
table = driver.find_element(By.TAG_NAME, "table")
table_html = table.get_attribute("outerHTML")
# Use pandas to read the HTML table
df = pd.read_html(StringIO(table_html))[0]
# Close the WebDriver
driver.quit()
# Display the DataFrame
print(df)
I want to scrape the larger “Park Factors” table that lists all the team stadiums and has stats for elements such as "wOBACon" and "BACON". I have attempted this by referring to “table” tag and it never seems to recognize that table. I have attempted indexing the tables but it send me an error that my index is out of range. I have also attempted using ID instead of TAG_NAME and referencing “parkFactors” to no avail. It just says it finds no object with that attribute (does not recognize a table that exists). I have tried to get around that by increasing the length of time it implicitly waits for the dynamically-loaded table to load, without success). I have also attempted referring to class “article-template” and “table-savant” tags with no luck. Any assistance is greatly appreciated!