2

I haven’t recently been using the code attached. For the past few weeks, it has been working completely fine and always produced results. However, I used this today and for some reason it didn’t work. Could you please help and provide a solution to the problem.

import requests, json
from bs4 import BeautifulSoup

headers = {
    "User-Agent":
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582"
}

params = {"q": "dji", "hl": "en", 'gl': 'us', 'tbm': 'shop'}

response = requests.get("https://www.google.com/search",
                        params=params,
                        headers=headers)
soup = BeautifulSoup(response.text, 'lxml')
# list with two dict() combined
shopping_data = []
shopping_results_dict = {}


for shopping_result in soup.select('.sh-dgr__content'):
    title = shopping_result.select_one('.Lq5OHe.eaGTj h4').text
    product_link = f"https://www.google.com{shopping_result.select_one('.Lq5OHe.eaGTj')['href']}"
    source = shopping_result.select_one('.IuHnof').text
    price = shopping_result.select_one('span.kHxwFf span').text

    try:
        rating = shopping_result.select_one('.Rsc7Yb').text
    except:
        rating = None

    try:
        reviews = shopping_result.select_one('.Rsc7Yb').next_sibling.next_sibling
    except:
        reviews = None

    try:
        delivery = shopping_result.select_one('.vEjMR').text
    except:
        delivery = None



    shopping_results_dict.update({
        'shopping_results': [{
            'title': title,
            'link': product_link,
            'source': source,
            'price': price,
            'rating': rating,
            'reviews': reviews,
            'delivery': delivery,
        }]
    })

    shopping_data.append(dict(shopping_results_dict))

print(title)

Image of the error produced

1 Answer 1

2

Because .select in for shopping_result in soup.select('.sh-dgr__content'): could not find any element so it gives you an empty list. Therefor the body of the for-loop is not executed. Python jumps out of the loop.

title only exists and is defined when the body of the for loop executes.

You should make sure you used a correct method to find your element(s).

Sign up to request clarification or add additional context in comments.

6 Comments

Oh okay, I figured the for loop wasn’t looping. Do you know anyway in which I could make the for loop work or make it find elements?
@HarrisonCox Apparently you need another way to find your element... I didn't see the html of the page, but for CSS classes, make sure you spell it correct. You could use XPATH which gives you more flexibility. Also I should mention that if the element has ID tag, it should be your first priority.
I’ve looked through the HTML of the website that I’m trying to access and still don’t understand the problem. Sorry to be a pain but I’ve attached a link to a website that I used to help me code this in the first place. If you have time, could you look through it and see if you can find a fix. Many thanks. dev.to/dmitryzub/scrape-google-shopping-with-python-49ad
@HarrisonCox I can only show you the path. with print(response.request.url) you can see which url you are requesting to. Then try to write the html to a file, and then try to find your element inside that file. If you inspect the created html file, you will get your answer why your code doesn't work... Many website load their content dynamically so Beautiful Soup can help you with, it doesn't render JS. or some websites constantly update the name of their tags (or maybe structures) to make things harder for scrapers...
@HarrisonCox It it loads dynamically use Selenium, if it changes the structure, use the location of elements instead of tag names.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.