Amazon web scraping with python [closed]

Question

Closed. This question needs debugging details. It is not currently accepting answers.

Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.

Closed 7 months ago.

Improve this question

I am scraping an Amazone page using Python and saving the result into a csv file. This code is running well, but the problem is that I get some product names without the first word. So for example here I get only: "Schuko Steckdose, EU-Standard 1 Fach Unterputz Mit 2,5D Curved Glas Platte, Wandsteckdose WeiÃŸ 86 * 86mm", but it is supposed to be "TAWOIA Schuko Steckdose, EU-Standard 1 Fach Unterputz Mit 2,5D Curved Glas Platte, Wandsteckdose WeiÃŸ 86 * 86mm" [enter image description here][1]

here is my code:

import requests
from bs4 import BeautifulSoup
import pandas as pd
from time import sleep

headers = {
    'User-Agent' : 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/135.0.0.0 Safari/537.36',
    'Accept-Language': 'de-DE, de;q=0.5'
}

search_query = 'steckdose'.replace(' ', '+')
base_url = 'https://www.amazon.de/s?k={0}'.format(search_query)

items = []
for i in range(1, 2):
    print('Processing {0}...'.format(base_url + '&page={0}'.format(i)))
    response= requests.get(base_url + '&page={0}'.format(i), headers = headers)

    if response.status_code !=200:
        print(f"Error:  {response.status_code}")
        continue

    soup = BeautifulSoup(response.content, 'html.parser')
    results = soup.find_all('div', {'data-component-type': 's-search-result'})

    if not results:
        print('No results found')
        continue
   


    for result in results:
        try:
            # Find the <a> tag first
            link = result.find('a', class_='a-link-normal s-line-clamp-4 s-link-style a-text-normal')
            if link:
                # Extract product name from the <span> tag inside <h2>
                product_name = link.find('h2').find('span').text.strip()  # Get text from <span>
                product_url = 'https://www.amazon.de' + link['href']
                items.append([product_name, product_url])
        except AttributeError:
            continue

        

    sleep(1.5)

df = pd.DataFrame(items, columns=['product', 'product url'])
df.to_csv('{0}.csv'.format(search_query), index = False)

Hey Cincinnatus I am refering to this page amazon.de/s?k=steckdose, not to amazonE :) — ellie_in_wonderland
– ellie_in_wonderland, Commented May 5 at 9:35
How to debug small programs: StackOverflow is a question-and-answer site for specific questions about actual code; “I wrote some buggy code that I can’t fix” is not a question, it’s a story, and not even an interesting story. — user16540390
– user16540390, Commented May 5 at 9:35
Compare the HTML you receive for products that include the first word with the HTML you receive for the products that don't. There may be some difference there that your code isn't accounting for. — Tangentially Perpendicular
– Tangentially Perpendicular, Commented May 5 at 9:36

ellie_in_wonderland · Accepted Answer · 2025-05-05 09:57:03Z

2

The problem was with this string: I change the q to 0 and now it works!

'Accept-Language': 'de-DE, de;q=0.5'

answered May 5 at 9:57

ellie_in_wonderland

133 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Amazon web scraping with python [closed]

1 Answer 1

Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Related