2

I've written a script in python to scrape names from a slow loading webpage. There are 1000 names in that page and the full content can only be loaded when the browser is made to scroll downmost. However, my script can successfully reach the lowest portion of this page and parse all the names. The issue I'm facing here is that I've used hardcoded delay which is 5 seconds in this case and it makes the browser unnecessarily wait even when the item is loaded. So how can i use explicit wait to overcome this situation and parse all the item.

Here is the script I've written so far:

from selenium import webdriver
import time

driver = webdriver.Chrome()
driver.get("http://fortune.com/fortune500/list/")

check_height = driver.execute_script("return document.body.scrollHeight;")
while True:
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
    time.sleep(5)
    height = driver.execute_script("return document.body.scrollHeight;") 
    if height == check_height: 
        break 
    check_height = height

    listElements = driver.find_elements_by_css_selector(".company-title")

for item in listElements:
    print(item.text)

2 Answers 2

1

You can add Explicit wait as below:

from selenium.webdriver.support.ui import WebDriverWait
from selenium import webdriver

driver = webdriver.Chrome()
driver.get("http://fortune.com/fortune500/list/")

check_height = driver.execute_script("return document.body.scrollHeight;")
while True:
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
    try:
        WebDriverWait(driver, 10).until(lambda driver: driver.execute_script("return document.body.scrollHeight;")  > check_height)
        check_height = driver.execute_script("return document.body.scrollHeight;") 
    except:
         break


listElements = driver.find_elements_by_css_selector(".company-title")
for item in listElements:
    print(item.text)

This should allow you to avoid hardcoding time.sleep()- instead you're just waiting for changing height value or break the loop in case height is constant after 10 seconds passed after scrolling...

Sign up to request clarification or add additional context in comments.

1 Comment

When it comes to provide any solution on selenium with python binding, Sir Andersson is second to none. You are just awesome. Thanksssssssss a lot.
0

You need to use explicit waits, like this:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

driver = webdriver.Firefox()
driver.get("http://somedomain/url_that_delays_loading")
try:
    element = WebDriverWait(driver, 10).until(
        EC.presence_of_element_located((By.ID, "myDynamicElement"))
    )
finally:
    driver.quit()

More details here http://selenium-python.readthedocs.io/waits.html

1 Comment

It doesn't seem that OP is looking for basics of ExplicitWait implementation, but specific solution...

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.