3

I am trying to scrape event links and contact information from the RaceRoster website (https://raceroster.com/search?q=5k&t=upcoming) using Python, requests, Pandas, and BeautifulSoup. The goal is to extract the Event Name, Event URL, Contact Name, and Email Address for each event and save the data into an Excel file so we can reach out to these events for business development purposes.

However, the script consistently reports that no event links are found on the search results page, despite the links being visible when inspecting the HTML in the browser. Here’s the relevant HTML for the event links from the search results page:

<a href="https://raceroster.com/events/2025/98542/13th-annual-delaware-tech-chocolate-run-5k" 
   target="_blank" 
   rel="noopener noreferrer" 
   class="search-results__card-event-name">
    13th Annual Delaware Tech Chocolate Run 5k
</a>

Steps Taken:

  1. Verified the correct selector for event links:
soup.select("a.search-results__card-event-name")
  1. Checked the response content from the requests.get() call using soup.prettify(). The HTML appears to lack the event links that are visible in the browser, suggesting the content may be loaded dynamically via JavaScript.

  2. Attempted to scrape the data using BeautifulSoup but consistently get:

Found 0 events on the page.
Scraped 0 events.
No contacts were scraped.

What I Need Help With:

  • How can I handle this JavaScript-loaded content? Is there a way to scrape it directly, or do I need to use a tool like Selenium?
  • If Selenium is required, how do I properly integrate it with BeautifulSoup for parsing the rendered HTML?

Current Script:

import requests
from bs4 import BeautifulSoup
import pandas as pd

def scrape_event_contacts(base_url, search_url):
    headers = {
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36"
    }
    event_contacts = []

    # Fetch the main search page
    print(f"Scraping page: {search_url}")
    response = requests.get(search_url, headers=headers)

    if response.status_code != 200:
        print(f"Failed to fetch page: {search_url}, status code: {response.status_code}")
        return event_contacts

    soup = BeautifulSoup(response.content, "html.parser")
    # Select event links
    event_links = soup.select("a.search-results__card-event-name")


    print(f"Found {len(event_links)} events on the page.")

    for link in event_links:
        event_url = link['href']
        event_name = link.text.strip()  # Extract Event Name

        try:
            print(f"Scraping event: {event_url}")
            event_response = requests.get(event_url, headers=headers)
            if event_response.status_code != 200:
                print(f"Failed to fetch event page: {event_url}, status code: {event_response.status_code}")
                continue

            event_soup = BeautifulSoup(event_response.content, "html.parser")

            # Extract contact name and email
            contact_name = event_soup.find("dd", class_="event-details__contact-list-definition")
            email = event_soup.find("a", href=lambda href: href and "mailto:" in href)

            contact_name_text = contact_name.text.strip() if contact_name else "N/A"
            email_address = email['href'].split("mailto:")[1].split("?")[0] if email else "N/A"

            if contact_name or email:
                print(f"Found contact: {contact_name_text}, email: {email_address}")
                event_contacts.append({
                    "Event Name": event_name,
                    "Event URL": event_url,
                    "Event Contact": contact_name_text,
                    "Email": email_address
                })
            else:
                print(f"No contact information found for {event_url}")
        except Exception as e:
            print(f"Error scraping event {event_url}: {e}")

    print(f"Scraped {len(event_contacts)} events.")
    return event_contacts

def save_to_spreadsheet(data, output_file):
    if not data:
        print("No data to save.")
        return
    df = pd.DataFrame(data)
    df.to_excel(output_file, index=False)
    print(f"Data saved to {output_file}")

if __name__ == "__main__":
    base_url = "https://raceroster.com"
    search_url = "https://raceroster.com/search?q=5k&t=upcoming"
    output_file = "/Users/my_name/Documents/event_contacts.xlsx"

    contact_data = scrape_event_contacts(base_url, search_url)
    if contact_data:
        save_to_spreadsheet(contact_data, output_file)
    else:
        print("No contacts were scraped.")

Expected Outcome:

  • Extract all event links from the search results page.
  • Navigate to each event’s detail page.
  • Scrape the contact name () and email () from the detail page.
  • Save the results to an Excel file.
4
  • Yep you need either dryscrape or selenium, dryscrape is pain to set-up. Commented Jan 3 at 2:00
  • Selenium is the best one Commented Jan 3 at 2:41
  • 1
    I did my best in scraping the data you need, you may extend the rest of outcomes you're thriving for, the stumbling for now is button could not be scrolled into view therefore cannot be clicked. also the contact informations are dynamic for every website sooo there's that. github.com/nikitimi/selenium-scraper.git Commented Jan 3 at 4:15
  • 1
    I've updated the repository, the URLs are now being wrtitten in urls.txt. Commented Jan 3 at 6:19

2 Answers 2

3

Use the API endpoint to get the data on upcoming events.

Here's how:

import requests
from tabulate import tabulate
import pandas as pd

url = 'https://search.raceroster.com/search?q=5k&t=upcoming'

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36',
}

events = requests.get(url,headers=headers).json()['data']

loc_keys = ["address", "city", "country"]

table = [
    [
        event["name"],
        event["url"],
        " ".join([event["location"][key] for key in loc_keys if key in event["location"]])
    ] for event in events
]

columns = ["Name", "URL", "Location"]
print(tabulate(table, headers=columns))

df = pd.DataFrame(table, columns=columns)
df.to_csv('5k_events.csv', index=False, header=True)

This should print:

Name                                         URL                                                                                         Location
-------------------------------------------  ------------------------------------------------------------------------------------------  ----------------------------------------------------------------------------------------------------------------------------
Credit Union Cherry Blossom                  https://raceroster.com/events/2025/72646/credit-union-cherry-blossom                        Washington, D.C. Washington United States
Big Cork Wine Run 5k                         https://raceroster.com/events/2025/98998/big-cork-wine-run-5k                               Big Cork Vineyards, 4236 Main Street, Rohrersville, MD 21779, U.S. Rohrersville United States
3rd Annual #OptOutside Black Friday Fun Run  https://raceroster.com/events/2025/98146/3rd-annual-number-optoutside-black-friday-fun-run  Grain H2O, Summit Harbour Place, Bear, DE, USA Bear United States
Ryan's Race 5K walk Run                      https://raceroster.com/events/2025/97852/ryans-race-5k-walk-run                             Odessa High School, Tony Marchio Drive, Townsend, DE Townsend United States
13th Annual Delaware  Tech Chocolate Run 5k  https://raceroster.com/events/2025/98542/13th-annual-delaware-tech-chocolate-run-5k         Delaware Technical Community College - Charles L. Terry Jr. Campus - Dover, Campus Drive, Dover, DE, USA Dover United States
Builders Dash 5k                             https://raceroster.com/events/2025/99146/builders-dash-5k                                   Rail Haus - Beer Garden, North West Street, Dover, DE Dover United States
The Ivy Scholarship 5k                       https://raceroster.com/events/2025/96874/the-ivy-scholarship-5k                             Hare Pavilion, River Place, Wilmington, DE Wilmington United States
39th Firecracker 5k Run Walk                 https://raceroster.com/events/2025/96907/39th-firecracker-5k-run-walk                       Rockford Tower, Lookout Drive, Wilmington, DE Wilmington United States
24th Annual John D Kelly Logan House 5k      https://raceroster.com/events/2025/97364/24th-annual-john-d-kelly-logan-house-5k            Kelly's Logan House, Delaware Avenue, Wilmington, DE, USA Wilmington United States
2nd Annual Scott Trot 5K                     https://raceroster.com/events/2025/96904/2nd-annual-scott-trot-5k                           American Legion Post 17, American Legion Road, Lewes, DE Lewes United States

Bonus:

To get more events data, just paginate the API with these parameters: l=10&p=1. For example, https://search.raceroster.com/search?q=5k&l=10&p=1&t=upcoming Also, note there's a field in meta -> hits that holds the number of found events. For your query that's 1465.

Sign up to request clarification or add additional context in comments.

5 Comments

Dang, I didn't considered the API endpoints, thanks for this knowledge.
The use of the API endpoint is great, thank you @baduker. Can I also use the API endpoint to fetch the Email Contact and Email Address of the Email Contact?
I added a print(events) statement right after I fetched and parsed the API response but I didn't see any contact name or email in the response. Is there another way to fetch these details?
@GuillTthere's no contact and email in the initial API response. You still need to visit every single url and search for those (if there are any).
Thank you @baduker Your solution worked to extract the URLs, name, and location and then I implemented nikitimi's solution below which extracted the contact information. Thank you both for your help.
1

I've added the raw urls and title in my GitHub repository, just change the user_name in save_to_document.py line 21.

Then run it to start the process in creating spreadsheet with the outcomes you've stated.

highly recommend to initialize virtual environment (venv).

Output looks like this:

enter image description here

7 Comments

Thank you for this potential solution @nikitimi! It appears you're using Selenium instead of BeautifulSoup. Is there a specific reason for this?
I checked your Github repo but it appears there are many duplicates in the urls.txt and titles.txt files. For example, Lunar New Year Fortune Run 5K/10K/13.1 PHILADELPHIA, which is located on page three of raceroster.com/search?q=5k&t=upcoming is not included in the event_contacts.xlsx output.
I was using Selenium to wait for the JavaScript files to be loaded in the website, because if you use beautiful soup you'll get no data since in the initial paint of the website, it is fetching the JavaScript files for the results.
Thank you @nikitimi, your solution in conjunction with baduker helped me extract the appropriate event contact information.
Glad we helped you, cheers!
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.