0

I developed this simple web scraping program to scrape newegg.com. I made a for loop to print out the name of the product, price, and shipping cost.

However, when I run the for loop it doesn't print out anything and does not give me any error. Before I write the for loop (commented items) I have ran those lines (commented items) and it prints the details only for one of the products.

from bs4 import BeautifulSoup
import requests
import csv

source = requests.get('https://www.newegg.com/PS4-Systems/SubCategory/ID-3102').text

soup = BeautifulSoup(source, 'lxml')

#prod = soup.find('a', class_='item-title').text
#price = soup.find('li', class_='price-current').text.strip()
#ship = soup.find('li', class_='price-ship').text.strip()
#print(prod.strip())
#print(price.strip())
#print(ship)

for info in soup.find_all('div', class_='item-container  '):
    prod = soup.find('a', class_='item-title').text
    price = soup.find('li', class_='price-current').text.strip()
    ship = soup.find('li', class_='price-ship').text.strip()
    print(prod.strip())
    #price.splitlines()[3].replace('\xa0', '')
    print(price.strip())
    print(ship)
4
  • 1
    for starters, you had an extra space in class_='item-container '. change that to class_='item-container ' But I'm thinking this page is dynamic so you'll need to do some extra work to get the data0 Commented Jan 4, 2019 at 21:59
  • 1
    @chitown88 is correct in pointing out the typo that prevented you from entering the loop. But the loop is also constructed incorrectly, as it repeats the same data for each div that your find_all captures. Commented Jan 4, 2019 at 22:06
  • my fault. it isn't dynamic. you never use info when you iterate (as stated in the solutions below) Commented Jan 4, 2019 at 22:16
  • 1
    @ Rick, if the answer solves your problem you should mark it as accepted Commented Jan 4, 2019 at 22:17

3 Answers 3

2

Write less code:

from bs4 import BeautifulSoup
import requests

source = requests.get('https://www.newegg.com/PS4-Systems/SubCategory/ID-3102').text    
soup = BeautifulSoup(source, 'lxml')

for info in soup.find_all('div', class_='item-container '):
    print(info.find('a', class_='item-title').text)
    print(info.find('li', class_='price-current').text.strip())        
    print(info.find('li', class_='price-ship').text.strip())
Sign up to request clarification or add additional context in comments.

Comments

2

Besides the 'space' typo and the indentation, you didn't actually use info in your for loop. This will just keep printing the first item. Use info in your for loop where you had soup.

from bs4 import BeautifulSoup
import requests
import csv

source = requests.get('https://www.newegg.com/PS4-Systems/SubCategory/ID-3102').text

soup = BeautifulSoup(source, 'lxml')

for info in soup.find_all('div', class_='item-container'):
    prod = info.find('a', class_='item-title').text.strip()
    price = info.find('li', class_='price-current').text.strip().splitlines()[1].replace(u'\xa0', '')
    if  u'$' not in price:
        price = info.find('li', class_='price-current').text.strip().splitlines()[0].replace(u'\xa0', '')
    ship = info.find('li', class_='price-ship').text.strip()
    print(prod)
    print(price)
    print(ship)

Because your code is not using info in the code below for info in soup.....: but soup.find(..), it will just keep looking for the first occurrence of e.g. soup.find('a', class_='item-title'). If you use info.find(....) it will use the next <div> element every loop of the for-loop.

Edit: I also found that the price is not always the second item when you use .splitlines(), sometimes it's the first. I therefor added a check to see if the item contained the '$' sign. If not, it used the first list item.

2 Comments

Can you help me to understand the .splitlines()[1]. More specifically the '[1]'.
.splitlines() creates a list, where the text is split into different items of the list. The [1] is the index for this list, meaning that it takes the second item of the list ([0] is the first item). In most cases, the second item of splitlines() contains the actual price.
-2

@Rick you mistakenly added extra space in for info in soup.find_all('div', class_='item-container '): this line after attribute value check below code it will work as you expected

from bs4 import BeautifulSoup
import requests
import csv

source = requests.get('https://www.newegg.com/PS4-Systems/SubCategory/ID-3102').text

soup = BeautifulSoup(source, 'lxml')

for info in soup.find_all('div', class_='item-container '):
    prod = soup.find('a', class_='item-title').text
    price = soup.find('li', class_='price-current').text.strip()
    ship = soup.find('li', class_='price-ship').text.strip()
    print(prod.strip())
    print(price.strip())
    print(ship)

hope this solve your problem...

4 Comments

Can you help me with something else? Each time the code runs the for loop it is supposed to choose a new product on Newegg each time. However, it prints the same info for one product every time. How can I fix this?
See my answer below
@Rick: Your error (that it always prints the same item) comes from not using info in the code below for info in soup.....:. Because you then use soup.find instead of info.find, it will just keep looking for the first occurrence of e.g. soup.find('a', class_='item-title'). If you use info.find(....) it will use the next <div> element every loop of the for-loop.
@Rick see the answers by @Niels Henkens and @Loss of human identity for single page, for all 8 pages increase Page count Page-1 upto 8 in this link https://www.newegg.com/PS4-Systems/SubCategory/ID-3102/Page-1?PageSize=36&order=BESTMATCH and iterate in same for loop

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.