scraping using beautiful soup and selenium problem

Question

right now i am doing scraping at this url https://www.lazada.com.my/products/xiaomi-mi-a1-4gb-ram-32gb-rom-i253761547-s336359472.html?spm=a2o4k.searchlistcategory.list.64.71546883QBZiNT&search=1

i want scraping all the review of the products but getting error.. any helps i really appreciate thanks you :)

my code

import requests
from selenium import webdriver
from bs4 import BeautifulSoup as soup
import time
from selenium.webdriver.chrome.options import Options


url = 'https://www.lazada.com.my/products/xiaomi-mi-a1-4gb-ram-32gb- rom-i253761547-s336359472.html? spm=a2o4k.searchlistcategory.list.64.71546883QBZiNT&search=1'

chrome_options = Options()
#chrome_options.add_argument("--headless")
browser = webdriver.Chrome('/Users/e5/fyp/chromedriver', 
chrome_options=chrome_options)
browser.get(url)
time.sleep(0.1)


d = soup(requests.get('https://www.lazada.com.my/products/xiaomi-mi-a1-4gb-ram-32gb-rom-i253761547-s336359472.html?spm=a2o4k.searchlistcategory.list.64.71546883QBZiNT&search=1').text, 'html.parser')
results = list(map(int, filter(None, [i.text for i in d.find_all('button', {'class':'next-pagination-item'})])))
print (results)
for i in range(min(results), max(results)+1):

    browser.find_element_by_xpath('//*[@id="module_product_review"]/div/div[3]/div[2]/div/div/button[{i}]').click()
    page_soups = soup(browser.page_source, 'html.parser')
    headline = page_soups.findAll('div',attrs={"class":"item-content"})

    for item in headline:
        top = item.div
        text_headlines = top.text
        print(text_headlines)

my error

InvalidSelectorException: Message: invalid selector: Unable to locate an element with the xpath expression //*[@id="module_product_review"]/div/div[3]/div[2]/div/div/button[{i}] because of the following error:
SyntaxError: Failed to execute 'evaluate' on 'Document': The string '//*[@id="module_product_review"]/div/div[3]/div[2]/div/div/button[{i}]' is not a valid XPath expression.
  (Session info: chrome=69.0.3497.100)
  (Driver info: chromedriver=2.37.544315 (730aa6a5fdba159ac9f4c1e8cbc59bf1b5ce12b7),platform=Windows NT 10.0.17134 x86_64)

Have you figured it out? Did you try my answer?

Moshe Slavin
– Moshe Slavin

2018-10-14 04:09:39 +00:00
Commented Oct 14, 2018 at 4:09 — Moshe Slavin
– Moshe Slavin, Commented Oct 14, 2018 at 4:09

Sohan Das · Accepted Answer · 2018-10-14 08:53:08Z

1

Simply use their json api, no need selenium or BeautifulSoup.

import requests

count = 0
for i in range(3):
    count+=1
    url = ('https://my.lazada.com.my/pdp/review/getReviewList?'
        'itemId=253761547&pageSize=5&filter=0&sort=0&pageNo='+str(count))
    req = requests.get(url)
    data = req.json()
    for i in data['model']['items']:
        buyerName = i['buyerName']
        reviewContent = i['reviewContent']
        print(buyerName, reviewContent)

edited Oct 14, 2018 at 8:53

answered Oct 13, 2018 at 15:49

Sohan Das

1,6302 gold badges17 silver badges19 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

moxasya Over a year ago

wow simple thanks but how do i capture the 'reviewContent' only using json

moxasya Over a year ago

just now i try working but why after that keeps show this error whenever i try to run it ----> 9 data = req.json() JSONDecodeError: Expecting value: line 2 column 1 (char 2) Thank youuuu im sorry keeps trouble you.. im new to json.. ;)

Sohan Das Over a year ago

Do not make too many requests in short time, if you do the site will block you, use time delay for each requests to avoid being blocked! Try after sometime or from different ip it will work!

Collectives™ on Stack Overflow

scraping using beautiful soup and selenium problem

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related