1

Python novice here. I have been learning how to scrape from various baseball sites (Fangraphs, Statcast, Rotowire). I have had success with a few different methods, but the Park Factors table on Statcast is giving me issues. I have tried using Selenium and I have tried saving a local html copy of the site on my computer to practice on without repeatedly sending requests to the Statcast server. The below script does scrape a table on the URL, but it’s the first I think, just a game score.

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import pandas as pd
from io import StringIO

# Set your user agent information
headers = {
    "User-Agent": "FirstName LastName <[email protected]>"
}

# Set the URL of the webpage containing the park factors
url = "https://baseballsavant.mlb.com/leaderboard/statcast-park-factors?type=year&year=2024&       
batSide=L&stat=index_wOBA&condition=All&rolling="

# Initialize the WebDriver (e.g., for Firefox)
driver = webdriver.Firefox()

# Navigate to the URL
driver.get(url)

# Wait for the table to be loaded (adjust the timeout as needed)
WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.TAG_NAME, "table")))

# Extract the table data
table = driver.find_element(By.TAG_NAME, "table")
table_html = table.get_attribute("outerHTML")

# Use pandas to read the HTML table
df = pd.read_html(StringIO(table_html))[0]

# Close the WebDriver
driver.quit()

# Display the DataFrame
print(df)

I want to scrape the larger “Park Factors” table that lists all the team stadiums and has stats for elements such as "wOBACon" and "BACON". I have attempted this by referring to “table” tag and it never seems to recognize that table. I have attempted indexing the tables but it send me an error that my index is out of range. I have also attempted using ID instead of TAG_NAME and referencing “parkFactors” to no avail. It just says it finds no object with that attribute (does not recognize a table that exists). I have tried to get around that by increasing the length of time it implicitly waits for the dynamically-loaded table to load, without success). I have also attempted referring to class “article-template” and “table-savant” tags with no luck. Any assistance is greatly appreciated!

1 Answer 1

1

The data is inside <script> element so to get it you can use re/json modules:

import json
import re

import pandas as pd
import requests

url = "https://baseballsavant.mlb.com/leaderboard/statcast-park-factors?type=year&year=2024&%20%20%20%20%20%20%20batSide=L&stat=index_wOBA&condition=All&rolling="

response = requests.get(url)
data = re.search(r"data = (.*);", response.text).group(1)
data = json.loads(data)
df = pd.DataFrame(data)

# print(df)
df["index_woba"] = df["index_woba"].astype(int)

out = df[["venue_name", "index_woba"]].sort_values(
    by=["index_woba", "venue_name"], ascending=[False, True]
)
print(out)

Prints:

                     venue_name  index_woba
10                  Coors Field         113
24             Globe Life Field         105
14     Great American Ball Park         105
4              Kauffman Stadium         105
26               Nationals Park         103
7                 Rogers Centre         102
20                  Truist Park         102
0                 Angel Stadium         101
15                Busch Stadium         101
22           Citizens Bank Park         101
9                 Wrigley Field         101
11               Dodger Stadium         100
17             Minute Maid Park         100
12                     PNC Park         100
27                 Target Field         100
16               loanDepot park         100
8                   Chase Field          99
18                Comerica Park          99
2         Guaranteed Rate Field          99
1   Oriole Park at Camden Yards          99
28               Yankee Stadium          99
13        American Family Field          98
5              Oakland Coliseum          97
6               Tropicana Field          97
25                   Citi Field          96
19                  Oracle Park          96
3             Progressive Field          96
21                   Petco Park          95
23                T-Mobile Park          93
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.