1

I would like to make web scraping using Selenium in all pages of the website below, but, until now, I could make it only in the first page. I also put data on a Pandas dataframe. How can I do this operation in all pages of this website? For now, I have:

from selenium import webdriver 
import pandas as pd 

driver = webdriver.Chrome(executable_path=r"C:/Users/Usuario/.spyder-py3/chromedriver.exe")

driver.get("https://www.mercadolivre.com.br/ofertas")
driver.implicitly_wait(3)

tituloProduto = driver.find_elements_by_class_name('promotion-item__title')
precoProduto = driver.find_elements_by_class_name('promotion-item__price')
df = pd.DataFrame()

produtos = []

for x in tituloProduto:
    produtos.append(x.text)
    
preco = []

for x in price:
    preco.append(x.text)
    
df['produto'] = produtos
df['preco'] = preco

df.head()    


                    produto                          preco

Furadeira Parafusadeira Com Impacto 20v 2 Bate...  R$ 34232

Sony Playstation 4 Slim 1tb Mega Pack: Ghost O...  R$ 2.549

Tablet Galaxy A7 Lite T225 4g Ram 64gb Grafite...  R$ 1.199

Smart Tv Philco Ptv55q20snbl Dled 4k 55 110v/220v  R$ 2.799

Nintendo Switch 32gb Standard Cor Vermelho-néo...  R$ 2.349

3 Answers 3

0

I found the website you want to scrape has 209 pages in total and can be accessed with the page number: https://www.mercadolivre.com.br/ofertas?page=2, so it should be not too difficult.

One thing you can do is to loop 209 times for getting the data from each page. A better approach would be to identify the "next page" button and loop until it's unavailable, but simply using the given page number (209) is easier, so will use that.

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as ec

driver = webdriver.Chrome(executable_path=r".../chromedriver.exe")

...

# Initialize outside the loop
preco = []
produtos = []

for i in range(209):
  # Parse each page with the code you already have.
  driver.get('https://www.mercadolivre.com.br/ofertas?page=' + str(i))

  # You may have to wait for each page to load
  wait = WebDriverWait(driver, 10)
  wait.until(ec.visibility_of_element_located((By.CSS_SELECTOR, "a.sc-2vbwj7-22.blyzsR")))
  
  # If you want to speed things up, you can process them in parallel
  # But you should do this only if it's worth it since it will take development time.

  # Get the variables you want
  tituloProduto = driver.find_elements_by_class_name('promotion-item__title')
  precoProduto = driver.find_elements_by_class_name('promotion-item__price')

  for x in tituloProduto:
    produtos.append(x.text)
  for x in price:
    preco.append(x.text)
  
Store list in DataFrame and do what you want with it.

Sign up to request clarification or add additional context in comments.

1 Comment

It worked perfectly. Thank you. I thought I had made a comment about your answer, but it seems it was not saved. I found great you have used concatenation to get the next page instead of the click method.
0

You can use this code.

from selenium import webdriver
import pandas as pd

driver = webdriver.Chrome(
    executable_path=r"C:/Users/Usuario/.spyder-py3/chromedriver.exe")

url = "https://www.mercadolivre.com.br/ofertas?page="
df = pd.DataFrame()
produtos = []
preco = []

for i in range(1, 209):
    driver.get(url + str(i))
    driver.implicitly_wait(3)

    tituloProduto = driver.find_elements_by_class_name('promotion-item__title')
    precoProduto = driver.find_elements_by_class_name('promotion-item__price')

    for x in tituloProduto:
        produtos.append(x.text)

    for x in precoProduto:
        preco.append(x.text)


df['produto'] = produtos
df['preco'] = preco

print(df)

Hope to be helpful for you. Thanks.

Comments

0

What you could do is find the pagination button and set it to a next_page variable like so:

next_page = response.xpath('XPATH HERE').css('a::attr(href)').extract_first()

and then call it like so:

yield scrapy.Request(next_page, callback=self.parse)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.