0

Selenium ver 3.141. Chrome driver, Windows 10

Hello, The objective is to extract the value of HTML DOM Property specifically the id,href and data-download-file-url for each of the images displayed from this website (Selection of this website is purely for educational purpose). While there exist other approach that can be applied to extract all these items, but at the time being, Im using the find_elements_by_xpath approach. Yet, I welcome if someone would like to suggest more efficient approach that I am not aware of.

From the aforementioned website, the Xpath to the target element is

/html/body/main/section[2]/div/div/figure[X]/div

with the capital X indicate the Image label that take the value from 1 to 50, for the aforementioned website. Each figure fall under the class showcase__content.

I tried the following lines

titles_element = browser.find_elements_by_xpath("//div[@class='showcase__content']/a")
# List Comprehension to get the actual repo titles and not the selenium objects.
titles = [x.text for x in titles_element]

However, there no dom properties extracted at the titles_element. Hence the titles produce [].

Im tempted to tried the following also but It give me an error instead

titles_element = browser.find_elements_by_xpath("//figure[1]/div[@class='showcase__content']//@data-download-file-url")

I really appreciate if someone can shed some light about this problem.

Example of the DOM property for Figure 1. The properties are all in pink color. https://drive.google.com/open?id=190q615C3uXLZUQNI8K4AJYL3Slii1ktO

2
  • Do you want to get the element in the picture you post? Commented Apr 15, 2020 at 6:32
  • I would like to extract the 3 elements id,href and data-download-file-url . But, if you can add extra example to get picture, then you are most welcome. Commented Apr 15, 2020 at 6:35

2 Answers 2

1

Now I could get the <img> tag,and get the url of the picture:

from selenium import webdriver

driver = webdriver.Chrome()
driver.get("https://www.freepik.com/search?dates=any&format=search&page=1&query=Polygonal%20Human&sort=popular")
# result = WebDriverWait(driver,5).until(EC.element_located_to_be_selected(driver.find_elements_by_css_selector("[class='lzy landscape lazyload--done']"))) 
result = driver.find_elements_by_css_selector("[class='lzy landscape lazyload--done']") # the class always be "lzy landscape lazyload--done"
for i in result:
    print(i.get_attribute('src'))

Result:

https://img.freepik.com/free-vector/innovative-medicine-abstract-composition-with-polygonal-wireframe-images-human-hand-carefully-holding-heart-vector-illustration_1284-30757.jpg?size=626&ext=jpg
https://img.freepik.com/free-vector/computer-generated-rendering-hand_41667-189.jpg?size=626&ext=jpg
https://img.freepik.com/free-vector/polygonal-wireframe-business-strategy-composition-with-glittering-images-human-hand-incandescent-lamp-with-text_1284-32265.jpg?size=626&ext=jpg
https://img.freepik.com/free-vector/particles-geometric-art-line-dot-engineering_31941-119.jpg?size=626&ext=jpg
........

Or get showcase__link:

result = driver.find_elements_by_css_selector("[class='showcase__link']")
for i in result:
    print(i.get_attribute('href'),i.get_attribute('id'),i.get_attribute('data-download-file-url'))
Sign up to request clarification or add additional context in comments.

3 Comments

Wish I can approved more than 1 answer. Your suggestion should get extra credit since you include also the extraction of images. May I know whether find_elements_by_css_selector is better than find_elements_by_xpath?
@balandongiv You read about this answer.It seems css selector faster than xpath.And in your example,use css selector is very clearly.
Thanks for the info @jizhihaoSAMA. I will accept your suggestion as answer since your suggestion is more efficient . Thanks
1

Try this (explanation in code comments):

from selenium import webdriver
from time import sleep
driver = webdriver.Chrome()

driver.get("https://www.freepik.com/search?dates=any&format=search&page=1&query=Polygonal%20Human&sort=popular")

sleep(1)

# get the all "a" elements by xpath (class name), so you can use find_elements_by_class_name() instead if you want
titles_element = driver.find_elements_by_xpath("//a[@class='showcase__link']")


# loop through the elements and extract the id, href, and data-download-file-url attributes
for element in titles_element:
    id = element.get_attribute('id')
    href =  element.get_attribute('href')
    file_url= element.get_attribute('data-download-file-url')
    print (id, href, file_url)

Output:

dtl-7200954 https://www.freepik.com/free-vector/innovative-medicine-abstract-composition-with-polygonal-wireframe-images-human-hand-carefully-holding-heart-vector-illustration_7200954.htm#page=1&query=Polygonal%20Human&position=0 https://www.freepik.com/download-file/7200954
dtl-4228610 https://www.freepik.com/premium-vector/particles-geometric-art-line-dot-engineering_4228610.htm#page=1&query=Polygonal%20Human&position=1 https://www.freepik.com/download-file/4228610
dtl-7379608 https://www.freepik.com/free-vector/polygonal-wireframe-business-strategy-composition-with-glittering-images-human-hand-incandescent-lamp-with-text_7379608.htm#page=1&query=Polygonal%20Human&position=2 https://www.freepik.com/download-file/7379608
dtl-7200952 https://www.freepik.com/free-vector/two-luminescent-polygonal-wireframe-human-hands-stretching-towards-each-other_7200952.htm#page=1&query=Polygonal%20Human&position=3 https://www.freepik.com/download-file/7200952
.
.
.

1 Comment

Wish I can approved more than 1 answer. Your suggestion should get extra credit since you include extra explanation. More importantly, you highlighted that all this attribute are indeed actually located under class showcase__link (Im not aware about this before). Edited: Thanks @Thaer, I have to accept the other suggestion as an answer despite your superb proposal.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.