How to extract the product titles from the website using Selenium Python

Question

Im trying to scrape the title from a website, but it is only returning 1 title. How can I get all the titles?

Below is one of the elements Im trying to fetch using xpath (starts-with):

<div id="post-4550574" class="post-box    " data-permalink="https://hypebeast.com/2019/4/undercover-nike-sfb-mountain-sneaker-release-info" data-title="The UNDERCOVER x Nike SFB Mountain Pack Gets a Release Date"><div class="post-box-image-container fixed-ratio-3-2">

This is my current code:

from selenium import webdriver
import requests
from bs4 import BeautifulSoup as bs

driver = webdriver.Chrome('/Users/Documents/python/Selenium/bin/chromedriver')
driver.get('https://hypebeast.com/search?s=nike+undercover')

element = driver.find_element_by_xpath(".//*[starts-with(@id, 'post-')]")
print(element.get_attribute('data-title'))

Output: The UNDERCOVER x Nike SFB Mountain Pack Gets a Release Date

I was expecting a lot more title but only returning one result.

You need multiple find_elements_, not single find_element_ — Guy
– Guy, Commented Apr 11, 2019 at 8:55

undetected Selenium · Accepted Answer · 2019-04-12 09:40:46Z

1

To extract the product titles from the website as the desired elements are JavaScript enabled elements you need to induce WebDriverWait for the visibility_of_all_elements_located() and you can use either of the following Locator Strategies:

XPATH:

driver.get('https://hypebeast.com/search?s=nike+undercover')
print([element.text for element in WebDriverWait(driver, 30).until(EC.visibility_of_all_elements_located((By.XPATH, "//h2/span")))])

CSS_SELECTOR:

driver.get('https://hypebeast.com/search?s=nike+undercover')
print([element.text for element in WebDriverWait(driver, 30).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "h2>span")))])

Console Output:

['The UNDERCOVER x Nike SFB Mountain Pack Gets a Release Date', 'The UNDERCOVER x Nike SFB Mountain Surfaces in "Dark Obsidian/University Red"', 'A First Look at UNDERCOVER’s Nike SFB Mountain Collaboration', "Here's Where to Buy the UNDERCOVER x Gyakusou Nike Running Models", 'Take Another Look at the Upcoming UNDERCOVER x Nike Daybreak', "Take an Official Look at GYAKUSOU's SS19 Footwear and Apparel Range", 'UNDERCOVER x Nike Daybreak Expected to Hit Shelves This Summer', "The 10 Best Sneakers From Paris Fashion Week's FW19 Runways", "UNDERCOVER FW19 Debuts 'A Clockwork Orange' Theme, Nike & Valentino Collabs", 'These Are the Best Sneakers of 2018']

edited Apr 12, 2019 at 9:40

answered Apr 11, 2019 at 10:16

undetected Selenium

194k44 gold badges304 silver badges387 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Hachi Over a year ago

Thank you Debanjan for providing the answer. Your answer was the only that worked for me. For anyone visiting the thread, I tried using find_elements but it failed by throwing the error no attribute was found I'm still a newbie in this area hence I would need to find out why in this particular case elements would not work. @ DebanjanB - would you know the answer to that?

undetected Selenium Over a year ago

I have updated the verbatim of my answer. As the the product titles from the website as the desired elements are JavaScript enabled elements you have to induce WebDriverWait for the visibility_of_all_elements_located() as find_elements alone will return with 0 elements

Hachi Over a year ago

@DebanjanB- Thank you for providing further details it certainly helps me learn more and prevent duplicate questions.

QHarr · Accepted Answer · 2019-04-11 14:19:36Z

1

You don't need selenium. You can use requests, which is faster, and target the data-title attribute

import requests
from bs4 import BeautifulSoup as bs

r = requests.get('https://hypebeast.com/search?s=nike+undercover')
soup = bs(r.content, 'lxml')
titles = [item['data-title'] for item in soup.select('[data-title]')]
print(titles)

If you do want selenium matching syntax is

from selenium import webdriver
driver = webdriver.Chrome()
driver.get('https://hypebeast.com/search?s=nike+undercover')
titles = [item.get_attribute('data-title') for item in driver.find_elements_by_css_selector('[data-title]')]
print(titles)

answered Apr 11, 2019 at 14:19

QHarr

84.5k14 gold badges58 silver badges105 bronze badges

1 Comment

QHarr Over a year ago

What didn't work with this answer? Did you get an error message? I tried both successfully.

S A · Accepted Answer · 2019-04-11 09:23:28Z

0

If a locator finds multiple elements then find_elemnt returns the first element. find_elements returns a list of all elements found by the locator.
Then you can iterate the list and get all the elements.

If all of the elements you are trying to find has the class post-box then you could find the elements by class name.

answered Apr 11, 2019 at 9:23

S A

1,9151 gold badge11 silver badges19 bronze badges

Comments

noman404 · Accepted Answer · 2021-03-16 17:04:39Z

0

Just sharing my experience and what I've used, might help someone. Just use,

element.get_attribute('ATTRIBUTE-NAME')

answered Mar 16, 2021 at 17:04

noman404

9401 gold badge8 silver badges23 bronze badges

Collectives™ on Stack Overflow

How to extract the product titles from the website using Selenium Python

4 Answers 4

3 Comments

1 Comment

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

3 Comments

1 Comment

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related