1

right now i am doing scraping at this url https://www.lazada.com.my/products/xiaomi-mi-a1-4gb-ram-32gb-rom-i253761547-s336359472.html?spm=a2o4k.searchlistcategory.list.64.71546883QBZiNT&search=1

i want scraping all the review of the products but getting error.. any helps i really appreciate thanks you :)

my code

import requests
from selenium import webdriver
from bs4 import BeautifulSoup as soup
import time
from selenium.webdriver.chrome.options import Options


url = 'https://www.lazada.com.my/products/xiaomi-mi-a1-4gb-ram-32gb- rom-i253761547-s336359472.html? spm=a2o4k.searchlistcategory.list.64.71546883QBZiNT&search=1'

chrome_options = Options()
#chrome_options.add_argument("--headless")
browser = webdriver.Chrome('/Users/e5/fyp/chromedriver', 
chrome_options=chrome_options)
browser.get(url)
time.sleep(0.1)


d = soup(requests.get('https://www.lazada.com.my/products/xiaomi-mi-a1-4gb-ram-32gb-rom-i253761547-s336359472.html?spm=a2o4k.searchlistcategory.list.64.71546883QBZiNT&search=1').text, 'html.parser')
results = list(map(int, filter(None, [i.text for i in d.find_all('button', {'class':'next-pagination-item'})])))
print (results)
for i in range(min(results), max(results)+1):

    browser.find_element_by_xpath('//*[@id="module_product_review"]/div/div[3]/div[2]/div/div/button[{i}]').click()
    page_soups = soup(browser.page_source, 'html.parser')
    headline = page_soups.findAll('div',attrs={"class":"item-content"})

    for item in headline:
        top = item.div
        text_headlines = top.text
        print(text_headlines)

my error

InvalidSelectorException: Message: invalid selector: Unable to locate an element with the xpath expression //*[@id="module_product_review"]/div/div[3]/div[2]/div/div/button[{i}] because of the following error:
SyntaxError: Failed to execute 'evaluate' on 'Document': The string '//*[@id="module_product_review"]/div/div[3]/div[2]/div/div/button[{i}]' is not a valid XPath expression.
  (Session info: chrome=69.0.3497.100)
  (Driver info: chromedriver=2.37.544315 (730aa6a5fdba159ac9f4c1e8cbc59bf1b5ce12b7),platform=Windows NT 10.0.17134 x86_64)
1
  • Have you figured it out? Did you try my answer? Commented Oct 14, 2018 at 4:09

1 Answer 1

1

Simply use their json api, no need selenium or BeautifulSoup.

import requests

count = 0
for i in range(3):
    count+=1
    url = ('https://my.lazada.com.my/pdp/review/getReviewList?'
        'itemId=253761547&pageSize=5&filter=0&sort=0&pageNo='+str(count))
    req = requests.get(url)
    data = req.json()
    for i in data['model']['items']:
        buyerName = i['buyerName']
        reviewContent = i['reviewContent']
        print(buyerName, reviewContent)
Sign up to request clarification or add additional context in comments.

3 Comments

wow simple thanks but how do i capture the 'reviewContent' only using json
just now i try working but why after that keeps show this error whenever i try to run it ----> 9 data = req.json() JSONDecodeError: Expecting value: line 2 column 1 (char 2) Thank youuuu im sorry keeps trouble you.. im new to json.. ;)
Do not make too many requests in short time, if you do the site will block you, use time delay for each requests to avoid being blocked! Try after sometime or from different ip it will work!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.