How to make web scraping in multiple pages with Selenium?

Question

I would like to make web scraping using Selenium in all pages of the website below, but, until now, I could make it only in the first page. I also put data on a Pandas dataframe. How can I do this operation in all pages of this website? For now, I have:

from selenium import webdriver 
import pandas as pd 

driver = webdriver.Chrome(executable_path=r"C:/Users/Usuario/.spyder-py3/chromedriver.exe")

driver.get("https://www.mercadolivre.com.br/ofertas")
driver.implicitly_wait(3)

tituloProduto = driver.find_elements_by_class_name('promotion-item__title')
precoProduto = driver.find_elements_by_class_name('promotion-item__price')
df = pd.DataFrame()

produtos = []

for x in tituloProduto:
    produtos.append(x.text)
    
preco = []

for x in price:
    preco.append(x.text)
    
df['produto'] = produtos
df['preco'] = preco

df.head()    


                    produto                          preco

Furadeira Parafusadeira Com Impacto 20v 2 Bate...  R$ 34232

Sony Playstation 4 Slim 1tb Mega Pack: Ghost O...  R$ 2.549

Tablet Galaxy A7 Lite T225 4g Ram 64gb Grafite...  R$ 1.199

Smart Tv Philco Ptv55q20snbl Dled 4k 55 110v/220v  R$ 2.799

Nintendo Switch 32gb Standard Cor Vermelho-néo...  R$ 2.349

Ziwon · Accepted Answer · 2021-08-29 10:36:28Z

0

I found the website you want to scrape has 209 pages in total and can be accessed with the page number: https://www.mercadolivre.com.br/ofertas?page=2, so it should be not too difficult.

One thing you can do is to loop 209 times for getting the data from each page. A better approach would be to identify the "next page" button and loop until it's unavailable, but simply using the given page number (209) is easier, so will use that.

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as ec

driver = webdriver.Chrome(executable_path=r".../chromedriver.exe")

...

# Initialize outside the loop
preco = []
produtos = []

for i in range(209):
  # Parse each page with the code you already have.
  driver.get('https://www.mercadolivre.com.br/ofertas?page=' + str(i))

  # You may have to wait for each page to load
  wait = WebDriverWait(driver, 10)
  wait.until(ec.visibility_of_element_located((By.CSS_SELECTOR, "a.sc-2vbwj7-22.blyzsR")))
  
  # If you want to speed things up, you can process them in parallel
  # But you should do this only if it's worth it since it will take development time.

  # Get the variables you want
  tituloProduto = driver.find_elements_by_class_name('promotion-item__title')
  precoProduto = driver.find_elements_by_class_name('promotion-item__price')

  for x in tituloProduto:
    produtos.append(x.text)
  for x in price:
    preco.append(x.text)
  
Store list in DataFrame and do what you want with it.

answered Aug 29, 2021 at 10:36

Ziwon

7397 silver badges16 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

user14753120 Over a year ago

It worked perfectly. Thank you. I thought I had made a comment about your answer, but it seems it was not saved. I found great you have used concatenation to get the next page instead of the click method.

lionking-123 · Accepted Answer · 2021-08-29 10:32:31Z

You can use this code.

from selenium import webdriver
import pandas as pd

driver = webdriver.Chrome(
    executable_path=r"C:/Users/Usuario/.spyder-py3/chromedriver.exe")

url = "https://www.mercadolivre.com.br/ofertas?page="
df = pd.DataFrame()
produtos = []
preco = []

for i in range(1, 209):
    driver.get(url + str(i))
    driver.implicitly_wait(3)

    tituloProduto = driver.find_elements_by_class_name('promotion-item__title')
    precoProduto = driver.find_elements_by_class_name('promotion-item__price')

    for x in tituloProduto:
        produtos.append(x.text)

    for x in precoProduto:
        preco.append(x.text)


df['produto'] = produtos
df['preco'] = preco

print(df)

Hope to be helpful for you. Thanks.

wayoh22 · Accepted Answer · 2021-08-29 10:34:32Z

0

What you could do is find the pagination button and set it to a next_page variable like so:

next_page = response.xpath('XPATH HERE').css('a::attr(href)').extract_first()

and then call it like so:

yield scrapy.Request(next_page, callback=self.parse)

answered Aug 29, 2021 at 10:34

wayoh22

17612 bronze badges

Collectives™ on Stack Overflow

How to make web scraping in multiple pages with Selenium?

3 Answers 3

1 Comment

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related