0

The tags from a web page are as follows:

<div class="lg_col MT5">
    <p>
        <span class="sp starGryB">4.4</span>
    </p>
    <p class="MT5 UC">
        <span class="gd10gb">141 Ratings</span>
    </p>
</div>

I am trying to retrieve the values "4.4", and "141 Ratings" for all the div class values "lg_col MT5".

The nested for loop that I use isn't working as expected. It seems as if the hierarchy of the tags isn't taken into account.

import requests
import sys
from bs4 import BeautifulSoup

HEADERS = {"User-Agent": "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:20.0) Gecko/20100101 Firefox/20.0"}

def test_function():
    url = "http://www.burrp.com/chennai/search.html?q=buffet"
    source_code = requests.get(url, headers=HEADERS) 
    plain_text = source_code.text
    soup = BeautifulSoup(plain_text)
    for tag in soup.select('div.lg_col.MT5'):
        for tag1 in soup.select('span.sp.starGryB'): 
            try:
                print(tag1.string)
            except KeyError:
                pass
        for tag2 in soup.select('span.gd10gb'):
            try:
                print(tag2.string)
            except KeyError:
                pass

test_function()

`

The expected output is: 4.4 followed by 141 Ratings for each of the div tags in the webpage.

But the output is: All the starGryB values followed by all the gd10gb values as this happens over and over again.

3
  • There's no starGryB class in an example you posted. Is it a typo? Also, "does not work as expected" is not very descriptive. How exactly it work and what do you expect from it? Commented Apr 27, 2015 at 18:12
  • Yeah that's a typo. Thanks for pointing that out. The correction has been made. The class too has to be starGryB. Commented Apr 27, 2015 at 18:16
  • The expected output is: 4.4 followed by 141 Ratings for each of the div tags in the webpage Commented Apr 27, 2015 at 18:17

2 Answers 2

1

Use tag.select instead of soup.select if you want to look in just tag and not the entire soup.

Sign up to request clarification or add additional context in comments.

1 Comment

thanks blender. :) using tag.select solves the issue.
0

Not for points.

This is another way to scrape it to avoid having to deal with loops.

import requests
from bs4 import BeautifulSoup

HEADERS = {"User-Agent": "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:20.0) Gecko/20100101 Firefox/20.0"}

url = "http://www.burrp.com/chennai/search.html?q=buffet"
source_code = requests.get(url, headers=HEADERS) 
plain_text = source_code.text
soup = BeautifulSoup(plain_text)

tags_1 = soup.find_all('span', class_='sp starGryB')
tags_2 = [tag.parent.parent.select('span.gd10gb') for tag in tags_1]
tags_3 = [tag.parent.parent.parent.select('a.gr24mb.UC') for tag in tags_1]

scores = [score.get_text() for score in tags_1]
ratings = [rating[0].get_text() if len(rating) > 0 else 'NA' for rating in tags_2]
names = [name[0].get_text().strip() for name in tags_3]

tags = zip(names, scores, ratings)
for a, b, c in tags:
    print a, b, c

Result:

Wild Amazon 2.9 27 Ratings
European Buffet NA NA
Flamingo 2.3 17 Ratings
The Holy Smoke 2.9 13 Ratings
Snow Park 2.6 14 Ratings
Dhabba Express 2.7 11 Ratings
The Yellow Chilli 2.7 6 Ratings
The Piano, The Savera Hotel 2.5 6 Ratings
Roasts & Grills, Green Park Hotel 2.3 6 Ratings
[Finished in 0.9s]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.