0

I am trying to scrape basic information on google. The code that I am using is the following. Unfortunately it does not move to the next page and I am not figuring the reason why. I am using selenium and google chrome as browser (no firefox). Could you please tell me what is wrong in my code?

driver.get('https://www.google.com/advanced_search?q=google&tbs=cdr:1,cd_min:3/4/2020,cd_max:3/4/2020&hl=en')

search = driver.find_element_by_name('q')
search.send_keys('tea')
search.submit()

soup = BeautifulSoup(driver.page_source,'lxml')
result_div = soup.find_all('div', attrs={'class': 'g'})

titles = []

while True:
    next_page_btn =driver.find_elements_by_xpath("//a[@id='pnnext']")
    for r in result_div:
        if len(next_page_btn) <1:
            print("no more pages left")
            break
        else:
            try:
                title = None
                title = r.find('h3')

                if isinstance(title,Tag):
                    title = title.get_text()
                    print(title)
                if title != '' :
                    titles.append(title)
            except:
                continue

        element =WebDriverWait(driver,5).until(expected_conditions.element_to_be_clickable((By.ID,'pnnext')))
        driver.execute_script("return arguments[0].scrollIntoView();", element)
        element.click()

1 Answer 1

1

I set q in the query string to be an empty string. Used as_q not q for the search box name. And reordered your code a bit. I put a page limit in to stop it going on forever.

from selenium import webdriver
from bs4 import BeautifulSoup
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions

driver = webdriver.Chrome()
driver.get('https://www.google.com/advanced_search?q=&tbs=cdr:1,cd_min:3/4/2020,cd_max:3/4/2020&hl=en')

search = driver.find_element_by_name('as_q')
search.send_keys('tea')
search.submit()

titles = []
page_limit = 5
page = 0

while True:
    soup = BeautifulSoup(driver.page_source, 'lxml')
    result_div = soup.find_all('div', attrs={'class': 'g'})
    for r in result_div:
        for title in r.find_all('h3'):
            title = title.get_text()
            print(title)
            titles.append(title)
    next_page_btn = driver.find_elements_by_id('pnnext')
    if len(next_page_btn) == 0 or page > page_limit:
        break
    element = WebDriverWait(driver, 5).until(expected_conditions.element_to_be_clickable((By.ID, 'pnnext')))
    driver.execute_script("return arguments[0].scrollIntoView();", element)
    element.click()
    page = page + 1
driver.quit()
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.