Selenium: how do I click show button, scrape hrefs, then click show button again?

Question

Link to the page I am trying to scrape:

https://www.nytimes.com/reviews/dining

Because this page has a "show more" button, I needed Selenium to automatically click the "show more" button iteratively, and then somehow use Beautiful soup to harvest the links to each individual restaurant review on the page. In the photo below, the link I want to harvest is within the https://...onigiri.html">.

Code so far:

url = "https://www.nytimes.com/reviews/dining"
driver = webdriver.Chrome('chromedriver',chrome_options=chrome_options)
driver.get(url)

for i in range(1):
  button = driver.find_element_by_tag_name("button")
  button.click()

How do I use WebDriverWait and BeautifulSoup [BeautifulSoup(driver.page_source, 'html.parser')] to complete this task?

Can you be more specific about which part you're struggling with? You probably don't need BeautifulSoup for this, by the way. — AMC
– AMC, Commented Apr 15, 2020 at 16:29
What have you tried? Did you look at other examples using WebDriverWait? And what links are you trying to scrape? You can most likely just use Selenium to get them and don't need BeautifulSoup at all. — Code-Apprentice
– Code-Apprentice, Commented Apr 15, 2020 at 16:33
@AMC yup! I've just included a photo in my problem to further clarify which links I am trying to scrape. — Anders Zhou
– Anders Zhou, Commented Apr 15, 2020 at 20:30
@Code-Apprentice I've tried looking at WebDriverWait documentation — there are things like find_element_by_tag_name, x_path, css_selector, but I'm not quite sure how to apply the examples that I've found around the internet to my particular problem just yet. — Anders Zhou
– Anders Zhou, Commented Apr 15, 2020 at 20:31

Mahsa Hassankashi · Accepted Answer · 2020-04-15 17:57:24Z

1

Go to https://www.nytimes.com/reviews/dining press F12 and then press Ctrl+Shift+C to get element Show More, then as I showed in picture get your xpath of element:

In order to find xpath please look at:

https://www.techbeamers.com/locate-elements-selenium-python/#locate-element-by-xpath

import time
from selenium import webdriver
from selenium.webdriver.chrome.options import Options

def executeTest():
    global driver
    driver.get('https://www.nytimes.com/reviews/dining')
    time.sleep(7)
    element = driver.find_element_by_xpath('Your_Xpath')
    element.click()
    time.sleep(3)

def startWebDriver():
    global driver
    options = Options()
    options.add_argument("--disable-infobars")
    driver = webdriver.Chrome(chrome_options=options)

if __name__ == "__main__":
    startWebDriver()
    executeTest()
    driver.quit()

edited Apr 15, 2020 at 17:57

answered Apr 15, 2020 at 17:48

Mahsa Hassankashi

2,1451 gold badge16 silver badges27 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

KunduK · Accepted Answer · 2020-04-16 10:30:08Z

0

This is a lazy loading application.To click on the Show More button you need to use infinite loop and scroll down the page to look for and then click and wait for some time to load the page and then store the value in the list.Verify the list before and after if it matches then break from infinite loop.

Code:

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium import webdriver
import time

driver=webdriver.Chrome()
driver.get("https://www.nytimes.com/reviews/dining")
#To accept the coockie click on that
WebDriverWait(driver,20).until(EC.element_to_be_clickable((By.XPATH,"//button[text()='Accept']"))).click()
listhref=[]

while(True):
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
    elements=WebDriverWait(driver,20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR,"a.css-gg4vpm")))
    lenlistbefore=len(listhref)
    for ele in elements:
        if ele.get_attribute("href") in listhref:
            continue
        else:
            listhref.append(ele.get_attribute("href"))

    lenlistafter = len(listhref)

    if lenlistbefore==lenlistafter:
        break

    button=WebDriverWait(driver,10).until(EC.visibility_of_element_located((By.XPATH,"//button[text()='Show More']")))
    driver.execute_script("arguments[0].click();", button)
    time.sleep(2)
print(len(listhref))
print(listhref)

Note:- I am getting list count 499

answered Apr 16, 2020 at 10:30

KunduK

33.4k5 gold badges19 silver badges42 bronze badges

1 Comment

Anders Zhou Over a year ago

Thank you so much! This worked with the adjustment to this line: "WebDriverWait(driver,20).until(EC.element_to_be_clickable((By.XPATH,"//button[text()='Accept']"))).click() listhref=[]" — I essentially just changed the XPATH to "//button[text()='Show More']", which was an easy and convenient fix.

Collectives™ on Stack Overflow

Selenium: how do I click show button, scrape hrefs, then click show button again?

2 Answers 2

Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related