2

for a personal project, I am trying to scrape this webpage:

https://www.ebay.com/b/Jordan-11-Retro-Cool-Grey-2001/15709/bn_7117643306

trying to get all img URLs, using Selenium.

here is the code:

url = 'https://www.ebay.com/b/Jordan-11-Retro-Cool-Grey-2001/15709/bn_7117643306'

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait

# open url
browser = webdriver.Chrome('/Users/mreznik/V5/chromedriver')
browser.implicitly_wait(2)
browser.get(url)

elems = browser.find_elements_by_tag_name("img")
for elem in elems:
    print(elem.get_attribute('src'))

and it gets me a list of results:

...
https://i.ebayimg.com/thumbs/images/g/M-sAAOSwahdgrd0x/s-l300.webp
https://i.ebayimg.com/thumbs/images/g/bpUAAOSwoa9gtlWw/s-l300.webp
https://ir.ebaystatic.com/cr/v/c1/s_1x2.gif
...

as one can see by running this, these are listings on the page who's URL is not on the list - and stranger yet, images here that are not on the page!

how can I get this right?

2
  • What you wish to get? Only links inside products images? Commented Jun 7, 2021 at 13:58
  • yes, ty - only those Commented Jun 7, 2021 at 14:06

1 Answer 1

2

You should get only the elements containing products images.
Please try this:

product_img_xpath = '//div[contains(@class,"s-item")]//img'
elems = browser.find_elements_by_xpath(product_img_xpath)
for elem in elems:
    print(elem.get_attribute('src'))

Don't forget some delay / wait before getting the elements list, something like this:

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

wait = WebDriverWait(browser, 20)

product_img_xpath = '//div[contains(@class,"s-item")]//img'
wait.until(EC.visibility_of_element_located((By.XPATH, product_img_xpath)))
time.sleep(1)

imgs = browser.find_elements_by_xpath(product_img_xpath)
for img in imgs:
    print(img.get_attribute('src'))

UPD
In case you still not getting all the elements in the list please try scrolling to the element before accessing it properties.

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.action_chains import ActionChains

wait = WebDriverWait(browser, 20)
actions = ActionChains(browser)

product_img_xpath = '//div[contains(@class,"s-item")]//img'
wait.until(EC.visibility_of_element_located((By.XPATH, product_img_xpath)))
time.sleep(1)

imgs = browser.find_elements_by_xpath(product_img_xpath)
for img in imgs:
    actions.move_to_element(img).perform()
    print(img.get_attribute('src'))
Sign up to request clarification or add additional context in comments.

7 Comments

at least in my machine, this get around half of the listing's images - far from all, and not more than my own solution
Please try updated answer. this should work correct
I'm sorry, but it does not - does it work on your machine?
I do not have Python at all on my computer, but I have some experience with Selenium... What exactly doesn't work now? You are getting too short list of elements with imgs = browser.find_elements_by_xpath(product_img_xpath) ? Did you try exactly what I wrote here? I have edited the answer 2-3 times.
In case it still doesn't get all the elements links please see the updated answer - scrolling to the elements before getting their src attribute.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.