0

Link to the page I am trying to scrape:

https://www.nytimes.com/reviews/dining

Because this page has a "show more" button, I needed Selenium to automatically click the "show more" button iteratively, and then somehow use Beautiful soup to harvest the links to each individual restaurant review on the page. In the photo below, the link I want to harvest is within the https://...onigiri.html">.

enter image description here

Code so far:

url = "https://www.nytimes.com/reviews/dining"
driver = webdriver.Chrome('chromedriver',chrome_options=chrome_options)
driver.get(url)

for i in range(1):
  button = driver.find_element_by_tag_name("button")
  button.click()

How do I use WebDriverWait and BeautifulSoup [BeautifulSoup(driver.page_source, 'html.parser')] to complete this task?

4
  • Can you be more specific about which part you're struggling with? You probably don't need BeautifulSoup for this, by the way. Commented Apr 15, 2020 at 16:29
  • What have you tried? Did you look at other examples using WebDriverWait? And what links are you trying to scrape? You can most likely just use Selenium to get them and don't need BeautifulSoup at all. Commented Apr 15, 2020 at 16:33
  • @AMC yup! I've just included a photo in my problem to further clarify which links I am trying to scrape. Commented Apr 15, 2020 at 20:30
  • @Code-Apprentice I've tried looking at WebDriverWait documentation — there are things like find_element_by_tag_name, x_path, css_selector, but I'm not quite sure how to apply the examples that I've found around the internet to my particular problem just yet. Commented Apr 15, 2020 at 20:31

2 Answers 2

1

Go to https://www.nytimes.com/reviews/dining press F12 and then press Ctrl+Shift+C to get element Show More, then as I showed in picture get your xpath of element:

enter image description here

In order to find xpath please look at:

https://www.techbeamers.com/locate-elements-selenium-python/#locate-element-by-xpath

import time
from selenium import webdriver
from selenium.webdriver.chrome.options import Options

def executeTest():
    global driver
    driver.get('https://www.nytimes.com/reviews/dining')
    time.sleep(7)
    element = driver.find_element_by_xpath('Your_Xpath')
    element.click()
    time.sleep(3)

def startWebDriver():
    global driver
    options = Options()
    options.add_argument("--disable-infobars")
    driver = webdriver.Chrome(chrome_options=options)

if __name__ == "__main__":
    startWebDriver()
    executeTest()
    driver.quit()
Sign up to request clarification or add additional context in comments.

Comments

0

This is a lazy loading application.To click on the Show More button you need to use infinite loop and scroll down the page to look for and then click and wait for some time to load the page and then store the value in the list.Verify the list before and after if it matches then break from infinite loop.

Code:

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium import webdriver
import time

driver=webdriver.Chrome()
driver.get("https://www.nytimes.com/reviews/dining")
#To accept the coockie click on that
WebDriverWait(driver,20).until(EC.element_to_be_clickable((By.XPATH,"//button[text()='Accept']"))).click()
listhref=[]

while(True):
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
    elements=WebDriverWait(driver,20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR,"a.css-gg4vpm")))
    lenlistbefore=len(listhref)
    for ele in elements:
        if ele.get_attribute("href") in listhref:
            continue
        else:
            listhref.append(ele.get_attribute("href"))

    lenlistafter = len(listhref)

    if lenlistbefore==lenlistafter:
        break

    button=WebDriverWait(driver,10).until(EC.visibility_of_element_located((By.XPATH,"//button[text()='Show More']")))
    driver.execute_script("arguments[0].click();", button)
    time.sleep(2)
print(len(listhref))
print(listhref)

Note:- I am getting list count 499

1 Comment

Thank you so much! This worked with the adjustment to this line: "WebDriverWait(driver,20).until(EC.element_to_be_clickable((By.XPATH,"//button[text()='Accept']"))).click() listhref=[]" — I essentially just changed the XPATH to "//button[text()='Show More']", which was an easy and convenient fix.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.