1

I am using selenium WebDriver to collect the URL's to images from a website that is loaded with JavaScript. It appears as though my following code returns only 160 out of the about 240 links. Why might this be - because of the JavaScript rendering?

Is there a way to adjust my code to get around this?

driver = webdriver.Chrome(ChromeDriverManager().install(), options = chrome_options)
driver.get('https://www.politicsanddesign.com/')
img_url = driver.find_elements_by_xpath("//div[@class='responsive-image-wrapper']/img")

img_url2 = []
for element in img_url:
    new_srcset = 'https:' + element.get_attribute("srcset").split(' 400w', 1)[0]
    img_url2.append(new_srcset)

1 Answer 1

1

You need to wait for all those elements to be loaded.
The recommended approach is to use WebDriverWait expected_conditions explicit waits.
This code is giving me 760-880 elements in the img_url2 list:

import time

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

options = Options()
options.add_argument("start-maximized")

webdriver_service = Service('C:\webdrivers\chromedriver.exe')
driver = webdriver.Chrome(options=options, service=webdriver_service)
wait = WebDriverWait(driver, 10)

url = "https://www.politicsanddesign.com/"

driver.get(url) # once the browser opens, turn off the year filter and scroll all the way to the bottom as the page does not load all elements on rendering
wait.until(EC.presence_of_all_elements_located((By.XPATH, "//div[@class='responsive-image-wrapper']/img")))
# time.sleep(2)
img_url = driver.find_elements(By.XPATH, "//div[@class='responsive-image-wrapper']/img")

img_url2 = []
for element in img_url:
    new_srcset = 'https:' + element.get_attribute("srcset").split(' 400w', 1)[0]
    img_url2.append(new_srcset)

I'm not sure if this code is stable enough, so if needed you can activate the delay between the wait line and the next line grabbing all those img_url.

EDIT:

Once the browser opens, you'll need to turn of the page's filter and then scroll all the way to the bottom of the page as it does not automatically load all of the elements when it renders; only once you've worked with the page a little bit.

Sign up to request clarification or add additional context in comments.

6 Comments

Thanks for the help! You're getting that many elements? When I do print(len(img_url2)) I get 240.
This is what I saw there. I can attach screenshot of my pycharm output
I've tried playing with the wait time, and I'm still maxing out at like 240. It consistently stops at like the same spot. When on the site, it looks like it reloads after you scroll down a while. Could that perhaps be influencing that?
Ahhh, I see what you did. You have to go to the page, turn off the filter and then scroll to the bottom of the page. Once I did that, it returned 1120 elements.
Weird. Regardless I got everything once I did that. Thanks for the help!
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.