2

I'm trying to scrape the player statistics in the Totals table at this link: http://www.basketball-reference.com/players/j/jordami01.html. It's much more difficult to scrape the data as-is when you first appear on that site, so you have the option of clicking 'CSV' right above the table. This format would be much easier to digest.

I'm having trouble

import urllib2
from bs4 import BeautifulSoup
from selenium import webdriver

player_link = "http://www.basketball-reference.com/players/j/jordami01.html"

browser = webdriver.Firefox()
browser.get(player_link)
elem = browser.find_element_by_xpath("//span[@class='tooltip' and @onlick='table2csv('totals')']")
elem.click()

When I run this, a Firefox window pops up, but the code never changes the table from its original format to CSV. The CSV table only pops up in the source code after I click CSV (obviously). How can I get selenium to click that CSV button and then BS to scrape the data?

1 Answer 1

3

You don't need BeautifulSoup here. Click the CSV button with selenium, extract the contents of the appeared pre element with CSV data and parse it with built-in csv module:

import csv
from StringIO import StringIO

from selenium import webdriver
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

player_link = "http://www.basketball-reference.com/players/j/jordami01.html"

browser = webdriver.Firefox()
wait = WebDriverWait(browser, 10)
browser.set_page_load_timeout(10)

# stop load after a timeout
try:
    browser.get(player_link)
except TimeoutException:
    browser.execute_script("window.stop();")

# click "CSV"
elem = wait.until(EC.presence_of_element_located((By.XPATH,  "//div[@class='table_heading']//span[. = 'CSV']")))
elem.click()

# get CSV data
csv_data = wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, "pre#csv_totals"))).text.encode("utf-8")
browser.close()

# read CSV
reader = csv.reader(StringIO(csv_data))
for line in reader:
    print(line)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.