python nested for to retrieve css tags values

Question

The tags from a web page are as follows:

<div class="lg_col MT5">
    <p>
        <span class="sp starGryB">4.4</span>
    </p>
    <p class="MT5 UC">
        <span class="gd10gb">141 Ratings</span>
    </p>
</div>

I am trying to retrieve the values "4.4", and "141 Ratings" for all the div class values "lg_col MT5".

The nested for loop that I use isn't working as expected. It seems as if the hierarchy of the tags isn't taken into account.

import requests
import sys
from bs4 import BeautifulSoup

HEADERS = {"User-Agent": "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:20.0) Gecko/20100101 Firefox/20.0"}

def test_function():
    url = "http://www.burrp.com/chennai/search.html?q=buffet"
    source_code = requests.get(url, headers=HEADERS) 
    plain_text = source_code.text
    soup = BeautifulSoup(plain_text)
    for tag in soup.select('div.lg_col.MT5'):
        for tag1 in soup.select('span.sp.starGryB'): 
            try:
                print(tag1.string)
            except KeyError:
                pass
        for tag2 in soup.select('span.gd10gb'):
            try:
                print(tag2.string)
            except KeyError:
                pass

test_function()

`

The expected output is: 4.4 followed by 141 Ratings for each of the div tags in the webpage.

But the output is: All the starGryB values followed by all the gd10gb values as this happens over and over again.

There's no starGryB class in an example you posted. Is it a typo? Also, "does not work as expected" is not very descriptive. How exactly it work and what do you expect from it? — J0HN
– J0HN, Commented Apr 27, 2015 at 18:12
Yeah that's a typo. Thanks for pointing that out. The correction has been made. The class too has to be starGryB. — RDPD
– RDPD, Commented Apr 27, 2015 at 18:16
The expected output is: 4.4 followed by 141 Ratings for each of the div tags in the webpage — RDPD
– RDPD, Commented Apr 27, 2015 at 18:17

Blender · Accepted Answer · 2015-04-27 18:17:16Z

1

Use tag.select instead of soup.select if you want to look in just tag and not the entire soup.

answered Apr 27, 2015 at 18:17

Blender

300k55 gold badges463 silver badges512 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

RDPD Over a year ago

thanks blender. :) using tag.select solves the issue.

WGS · Accepted Answer · 2015-04-27 18:33:33Z

Not for points.

This is another way to scrape it to avoid having to deal with loops.

import requests
from bs4 import BeautifulSoup

HEADERS = {"User-Agent": "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:20.0) Gecko/20100101 Firefox/20.0"}

url = "http://www.burrp.com/chennai/search.html?q=buffet"
source_code = requests.get(url, headers=HEADERS) 
plain_text = source_code.text
soup = BeautifulSoup(plain_text)

tags_1 = soup.find_all('span', class_='sp starGryB')
tags_2 = [tag.parent.parent.select('span.gd10gb') for tag in tags_1]
tags_3 = [tag.parent.parent.parent.select('a.gr24mb.UC') for tag in tags_1]

scores = [score.get_text() for score in tags_1]
ratings = [rating[0].get_text() if len(rating) > 0 else 'NA' for rating in tags_2]
names = [name[0].get_text().strip() for name in tags_3]

tags = zip(names, scores, ratings)
for a, b, c in tags:
    print a, b, c

Result:

Wild Amazon 2.9 27 Ratings
European Buffet NA NA
Flamingo 2.3 17 Ratings
The Holy Smoke 2.9 13 Ratings
Snow Park 2.6 14 Ratings
Dhabba Express 2.7 11 Ratings
The Yellow Chilli 2.7 6 Ratings
The Piano, The Savera Hotel 2.5 6 Ratings
Roasts & Grills, Green Park Hotel 2.3 6 Ratings
[Finished in 0.9s]

Collectives™ on Stack Overflow

python nested for to retrieve css tags values

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related