0

Im trying to scrape the title from a website, but it is only returning 1 title. How can I get all the titles?

Below is one of the elements Im trying to fetch using xpath (starts-with):

<div id="post-4550574" class="post-box    " data-permalink="https://hypebeast.com/2019/4/undercover-nike-sfb-mountain-sneaker-release-info" data-title="The UNDERCOVER x Nike SFB Mountain Pack Gets a Release Date"><div class="post-box-image-container fixed-ratio-3-2">

This is my current code:

from selenium import webdriver
import requests
from bs4 import BeautifulSoup as bs

driver = webdriver.Chrome('/Users/Documents/python/Selenium/bin/chromedriver')
driver.get('https://hypebeast.com/search?s=nike+undercover')

element = driver.find_element_by_xpath(".//*[starts-with(@id, 'post-')]")
print(element.get_attribute('data-title'))

Output: The UNDERCOVER x Nike SFB Mountain Pack Gets a Release Date

I was expecting a lot more title but only returning one result.

1
  • 2
    You need multiple find_elements_, not single find_element_ Commented Apr 11, 2019 at 8:55

4 Answers 4

1

To extract the product titles from the website as the desired elements are JavaScript enabled elements you need to induce WebDriverWait for the visibility_of_all_elements_located() and you can use either of the following Locator Strategies:

  • XPATH:

    driver.get('https://hypebeast.com/search?s=nike+undercover')
    print([element.text for element in WebDriverWait(driver, 30).until(EC.visibility_of_all_elements_located((By.XPATH, "//h2/span")))])
    
  • CSS_SELECTOR:

    driver.get('https://hypebeast.com/search?s=nike+undercover')
    print([element.text for element in WebDriverWait(driver, 30).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "h2>span")))])
    
  • Console Output:

    ['The UNDERCOVER x Nike SFB Mountain Pack Gets a Release Date', 'The UNDERCOVER x Nike SFB Mountain Surfaces in "Dark Obsidian/University Red"', 'A First Look at UNDERCOVER’s Nike SFB Mountain Collaboration', "Here's Where to Buy the UNDERCOVER x Gyakusou Nike Running Models", 'Take Another Look at the Upcoming UNDERCOVER x Nike Daybreak', "Take an Official Look at GYAKUSOU's SS19 Footwear and Apparel Range", 'UNDERCOVER x Nike Daybreak Expected to Hit Shelves This Summer', "The 10 Best Sneakers From Paris Fashion Week's FW19 Runways", "UNDERCOVER FW19 Debuts 'A Clockwork Orange' Theme, Nike & Valentino Collabs", 'These Are the Best Sneakers of 2018']
    
Sign up to request clarification or add additional context in comments.

3 Comments

Thank you Debanjan for providing the answer. Your answer was the only that worked for me. For anyone visiting the thread, I tried using find_elements but it failed by throwing the error no attribute was found I'm still a newbie in this area hence I would need to find out why in this particular case elements would not work. @ DebanjanB - would you know the answer to that?
I have updated the verbatim of my answer. As the the product titles from the website as the desired elements are JavaScript enabled elements you have to induce WebDriverWait for the visibility_of_all_elements_located() as find_elements alone will return with 0 elements
@DebanjanB- Thank you for providing further details it certainly helps me learn more and prevent duplicate questions.
1

You don't need selenium. You can use requests, which is faster, and target the data-title attribute

import requests
from bs4 import BeautifulSoup as bs

r = requests.get('https://hypebeast.com/search?s=nike+undercover')
soup = bs(r.content, 'lxml')
titles = [item['data-title'] for item in soup.select('[data-title]')]
print(titles)

If you do want selenium matching syntax is

from selenium import webdriver
driver = webdriver.Chrome()
driver.get('https://hypebeast.com/search?s=nike+undercover')
titles = [item.get_attribute('data-title') for item in driver.find_elements_by_css_selector('[data-title]')]
print(titles)   

1 Comment

What didn't work with this answer? Did you get an error message? I tried both successfully.
0

If a locator finds multiple elements then find_elemnt returns the first element. find_elements returns a list of all elements found by the locator.
Then you can iterate the list and get all the elements.

If all of the elements you are trying to find has the class post-box then you could find the elements by class name.

Comments

0

Just sharing my experience and what I've used, might help someone. Just use,

element.get_attribute('ATTRIBUTE-NAME')

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.