1

I am trying to get the pdf files from this website. I am trying to create a double loop so I can scroll over the years (Season) to get all the main pdf located in each year.

The line of code is not working is this one. The problem is, I can not make this line work (The one that is supposed to loop all over the years (Season):

for year in wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, "#season a aria-valuetext"))):
 year.click() 

This is the full code:

  os.chdir("C:..")
    driver = webdriver.Chrome("chromedriver.exe")
    wait = WebDriverWait(driver, 10)
    driver.get("http://www.motogp.com/en/Results+Statistics/")
    links = []


    for year in wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, "#season a aria-valuetext"))):
     year.click()                                                          
     for item in wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, "#event option"))):
         item.click()
         elem = wait.until(EC.presence_of_element_located((By.CLASS_NAME, "padleft5")))
         print(elem.get_attribute("href"))
         links.append(elem.get_attribute("href"))
         wait.until(EC.staleness_of(elem))

    driver.quit()

This is a previous post where I got help with the code above:

Scraping pdfs from this web

3
  • What is the problem/question? Commented Jan 16, 2018 at 19:22
  • I updated the question with bold letters. My problem is, the first line of the for loop i.e. for year in wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, "#season a aria-valuetext"))): year.click() is not working as I want it to work. I want to click all the seasons or years to get the pdfs that you can get of all the events for each year, the second line it does work: for item in wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, "#event option"))): Commented Jan 16, 2018 at 19:46
  • "The line of code is not working is this one" is not a useful description. You should edit your post to include what you expect the code to do and what it actually does instead. Commented Jan 16, 2018 at 20:56

2 Answers 2

2

The solution below should work for you. First, we iterate over the # of years in the CSS slider. Then we work the list using your code example. Added a sleep command because I kept getting a timeout.

CODE

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.keys import Keys
import time

driver = webdriver.Chrome("chromedriver.exe")
wait = WebDriverWait(driver, 10)
driver.get("http://www.motogp.com/en/Results+Statistics/")

slider = driver.find_element_by_xpath('//*[@id="handle_season"]')

for year in range(68):
    wait.until(EC.presence_of_all_elements_located((By.XPATH, '//*[@id="event"]')))    
    for item in wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, "#event option"))):
        item.click()
        elem = wait.until(EC.presence_of_element_located((By.CLASS_NAME, "padleft5")))
        print(elem.get_attribute("href"))
        wait.until(EC.staleness_of(elem))

    slider.send_keys(Keys.ARROW_LEFT)
    time.sleep(1)


driver.quit()

Result:

Screenshot of result

Sign up to request clarification or add additional context in comments.

1 Comment

@Shahin Nice catch, updated the post to include that import. And thanks for the feedback. :)
1

If your working behind a firewall then a lot of times your EC’s won’t work. See if a time.sleep(10) function doesn’t get you past it, instead of an EC. Secondly, check the page_source before you run the EC... if you’re behind a firewall the HTML source code will tell you.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.