Web scraping program for loop returns nothing

Question

I developed this simple web scraping program to scrape newegg.com. I made a for loop to print out the name of the product, price, and shipping cost.

However, when I run the for loop it doesn't print out anything and does not give me any error. Before I write the for loop (commented items) I have ran those lines (commented items) and it prints the details only for one of the products.

from bs4 import BeautifulSoup
import requests
import csv

source = requests.get('https://www.newegg.com/PS4-Systems/SubCategory/ID-3102').text

soup = BeautifulSoup(source, 'lxml')

#prod = soup.find('a', class_='item-title').text
#price = soup.find('li', class_='price-current').text.strip()
#ship = soup.find('li', class_='price-ship').text.strip()
#print(prod.strip())
#print(price.strip())
#print(ship)

for info in soup.find_all('div', class_='item-container  '):
    prod = soup.find('a', class_='item-title').text
    price = soup.find('li', class_='price-current').text.strip()
    ship = soup.find('li', class_='price-ship').text.strip()
    print(prod.strip())
    #price.splitlines()[3].replace('\xa0', '')
    print(price.strip())
    print(ship)

for starters, you had an extra space in class_='item-container '. change that to class_='item-container ' But I'm thinking this page is dynamic so you'll need to do some extra work to get the data0 — chitown88
– chitown88, Commented Jan 4, 2019 at 21:59
@chitown88 is correct in pointing out the typo that prevented you from entering the loop. But the loop is also constructed incorrectly, as it repeats the same data for each div that your find_all captures. — chb
– chb, Commented Jan 4, 2019 at 22:06
my fault. it isn't dynamic. you never use info when you iterate (as stated in the solutions below) — chitown88
– chitown88, Commented Jan 4, 2019 at 22:16
@ Rick, if the answer solves your problem you should mark it as accepted — Employee
– Employee, Commented Jan 4, 2019 at 22:17

Employee · Accepted Answer · 2019-01-04 22:12:27Z

2

Write less code:

from bs4 import BeautifulSoup
import requests

source = requests.get('https://www.newegg.com/PS4-Systems/SubCategory/ID-3102').text    
soup = BeautifulSoup(source, 'lxml')

for info in soup.find_all('div', class_='item-container '):
    print(info.find('a', class_='item-title').text)
    print(info.find('li', class_='price-current').text.strip())        
    print(info.find('li', class_='price-ship').text.strip())

edited Jan 4, 2019 at 22:12

answered Jan 4, 2019 at 22:06

Employee

3,2416 gold badges34 silver badges54 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Niels Henkens · Accepted Answer · 2019-01-04 22:32:09Z

2

Besides the 'space' typo and the indentation, you didn't actually use info in your for loop. This will just keep printing the first item. Use info in your for loop where you had soup.

from bs4 import BeautifulSoup
import requests
import csv

source = requests.get('https://www.newegg.com/PS4-Systems/SubCategory/ID-3102').text

soup = BeautifulSoup(source, 'lxml')

for info in soup.find_all('div', class_='item-container'):
    prod = info.find('a', class_='item-title').text.strip()
    price = info.find('li', class_='price-current').text.strip().splitlines()[1].replace(u'\xa0', '')
    if  u'$' not in price:
        price = info.find('li', class_='price-current').text.strip().splitlines()[0].replace(u'\xa0', '')
    ship = info.find('li', class_='price-ship').text.strip()
    print(prod)
    print(price)
    print(ship)

Because your code is not using info in the code below for info in soup.....: but soup.find(..), it will just keep looking for the first occurrence of e.g. soup.find('a', class_='item-title'). If you use info.find(....) it will use the next <div> element every loop of the for-loop.

Edit: I also found that the price is not always the second item when you use .splitlines(), sometimes it's the first. I therefor added a check to see if the item contained the '$' sign. If not, it used the first list item.

edited Jan 4, 2019 at 22:32

answered Jan 4, 2019 at 22:04

Niels Henkens

2,7161 gold badge14 silver badges29 bronze badges

2 Comments

Rick Over a year ago

Can you help me to understand the .splitlines()[1]. More specifically the '[1]'.

Niels Henkens Over a year ago

.splitlines() creates a list, where the text is split into different items of the list. The [1] is the index for this list, meaning that it takes the second item of the list ([0] is the first item). In most cases, the second item of splitlines() contains the actual price.

Dev · Accepted Answer · 2019-01-04 21:59:47Z

-2

@Rick you mistakenly added extra space in for info in soup.find_all('div', class_='item-container '): this line after attribute value check below code it will work as you expected

from bs4 import BeautifulSoup
import requests
import csv

source = requests.get('https://www.newegg.com/PS4-Systems/SubCategory/ID-3102').text

soup = BeautifulSoup(source, 'lxml')

for info in soup.find_all('div', class_='item-container '):
    prod = soup.find('a', class_='item-title').text
    price = soup.find('li', class_='price-current').text.strip()
    ship = soup.find('li', class_='price-ship').text.strip()
    print(prod.strip())
    print(price.strip())
    print(ship)

hope this solve your problem...

answered Jan 4, 2019 at 21:59

Dev

2,8102 gold badges25 silver badges37 bronze badges

4 Comments

Rick Over a year ago

Can you help me with something else? Each time the code runs the for loop it is supposed to choose a new product on Newegg each time. However, it prints the same info for one product every time. How can I fix this?

Niels Henkens Over a year ago

See my answer below

Niels Henkens Over a year ago

@Rick: Your error (that it always prints the same item) comes from not using info in the code below for info in soup.....:. Because you then use soup.find instead of info.find, it will just keep looking for the first occurrence of e.g. soup.find('a', class_='item-title'). If you use info.find(....) it will use the next <div> element every loop of the for-loop.

Dev Over a year ago

@Rick see the answers by @Niels Henkens and @Loss of human identity for single page, for all 8 pages increase Page count Page-1 upto 8 in this link https://www.newegg.com/PS4-Systems/SubCategory/ID-3102/Page-1?PageSize=36&order=BESTMATCH and iterate in same for loop

Collectives™ on Stack Overflow

Web scraping program for loop returns nothing

3 Answers 3

Comments

2 Comments

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

2 Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related