1

I am trying to get all the href links from https://search.yhd.com/c0-0-1003817/ (the ones that lead to the specific products), but although my code runs, it only gets 30 links. I don't know why this is happening. Could you help me, please?

I've been working with selenium (python 3.7), but previously I also tried to get the codes with beautiful soup. That didn't work either.

from selenium import webdriver 
import time
import requests
import pandas as pd

def getListingLinks(link):
    # Open the driver
    driver = webdriver.Chrome()
    driver.get(link)
    time.sleep(3)

    # Save the links
    listing_links = []
    links = driver.find_elements_by_xpath('//a[@class="img"]')
    for link in links:
        listing_links.append(str(link.get_attribute('href')))
    driver.close()
    return listing_links

imported = getListingLinks("https://search.yhd.com/c0-0-1003817/")

I should get 60 links, but I am only managing to get 30 with my code.

1
  • You can get the html page source using page_source attribute of selenium's "driver" and then use beautifulSoup's findAll function to get all the href tags. This will help you to get all the links present on the website. Further you can filter those href tags as per your needs. Commented Apr 18, 2019 at 5:08

1 Answer 1

2

at initial load, the page contains only 30 images/links. only when you scroll down, does it load all 60 items. you need to do the following:

def getListingLinks(link):
    # Open the driver
    driver = webdriver.Chrome()
    driver.maximize_window()
    driver.get(link)
    time.sleep(3)
    # scroll down: repeated to ensure it reaches the bottom and all items are loaded
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
    time.sleep(3)
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
    time.sleep(3)

    # Save the links
    listing_links = []
    links = driver.find_elements_by_xpath('//a[@class="img"]')
    for link in links:
        listing_links.append(str(link.get_attribute('href')))
    driver.close()
    return listing_links

imported = getListingLinks("https://search.yhd.com/c0-0-1003817/")

print(len(imported))  ## Output:  60
Sign up to request clarification or add additional context in comments.

2 Comments

It works perfectly, thank you! Just to know, why does these lines of code repeat? driver.execute_script("window.scrollTo(0, document.body.scrollHeight);") time.sleep(3) I've tried to remove one of the repetitions and that way it gets only 30 links, so I understand they make the code work, but why?
it doesn't reach all the way to the bottom when it's only once. you have to play with repeated scrolling and wait-time to get it to work for different sites. I added comment in the code above.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.