Why is selenium webdriver in python not returning all image links?

Question

I am using selenium WebDriver to collect the URL's to images from a website that is loaded with JavaScript. It appears as though my following code returns only 160 out of the about 240 links. Why might this be - because of the JavaScript rendering?

Is there a way to adjust my code to get around this?

driver = webdriver.Chrome(ChromeDriverManager().install(), options = chrome_options)
driver.get('https://www.politicsanddesign.com/')
img_url = driver.find_elements_by_xpath("//div[@class='responsive-image-wrapper']/img")

img_url2 = []
for element in img_url:
    new_srcset = 'https:' + element.get_attribute("srcset").split(' 400w', 1)[0]
    img_url2.append(new_srcset)

Damon C. Roberts · Accepted Answer · 2022-11-07 01:38:39Z

1

You need to wait for all those elements to be loaded.
The recommended approach is to use WebDriverWait expected_conditions explicit waits.
This code is giving me 760-880 elements in the img_url2 list:

import time

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

options = Options()
options.add_argument("start-maximized")

webdriver_service = Service('C:\webdrivers\chromedriver.exe')
driver = webdriver.Chrome(options=options, service=webdriver_service)
wait = WebDriverWait(driver, 10)

url = "https://www.politicsanddesign.com/"

driver.get(url) # once the browser opens, turn off the year filter and scroll all the way to the bottom as the page does not load all elements on rendering
wait.until(EC.presence_of_all_elements_located((By.XPATH, "//div[@class='responsive-image-wrapper']/img")))
# time.sleep(2)
img_url = driver.find_elements(By.XPATH, "//div[@class='responsive-image-wrapper']/img")

img_url2 = []
for element in img_url:
    new_srcset = 'https:' + element.get_attribute("srcset").split(' 400w', 1)[0]
    img_url2.append(new_srcset)

I'm not sure if this code is stable enough, so if needed you can activate the delay between the wait line and the next line grabbing all those img_url.

EDIT:

Once the browser opens, you'll need to turn of the page's filter and then scroll all the way to the bottom of the page as it does not automatically load all of the elements when it renders; only once you've worked with the page a little bit.

edited Nov 7, 2022 at 1:38

Damon C. Roberts

3262 silver badges16 bronze badges

answered Nov 3, 2022 at 15:57

Prophet

33.5k29 gold badges58 silver badges90 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

Damon C. Roberts Over a year ago

Thanks for the help! You're getting that many elements? When I do print(len(img_url2)) I get 240.

Prophet Over a year ago

This is what I saw there. I can attach screenshot of my pycharm output

Damon C. Roberts Over a year ago

I've tried playing with the wait time, and I'm still maxing out at like 240. It consistently stops at like the same spot. When on the site, it looks like it reloads after you scroll down a while. Could that perhaps be influencing that?

Damon C. Roberts Over a year ago

Ahhh, I see what you did. You have to go to the page, turn off the filter and then scroll to the bottom of the page. Once I did that, it returned 1120 elements.

Damon C. Roberts Over a year ago

Weird. Regardless I got everything once I did that. Thanks for the help!

|

Collectives™ on Stack Overflow

Why is selenium webdriver in python not returning all image links?

1 Answer 1

6 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

6 Comments

Your Answer

Sign up or log in

Post as a guest

Related