Scraping data from html table, selecting rows with certain attributes

Question

I am scraping information from the following website: "http://www.mobygames.com/game/wheelman/view-moby-score". Here is my code

url_credit = "http://www.mobygames.com/game/wheelman/view-moby-score"
response = requests.get(url_credit, headers=headers)
soup = BeautifulSoup(response.text, "lxml")
table = soup.find("table", class_="reviewList table table-striped table-condensed table-hover").select('tr[valign="top"]')
for row in table[1:]:
    print(row)
    x = soup.select('td[class="left"]').get("colspan")

My desired output is something like this:

platform     total_votes rating_category score  total_score
PlayStation3 None        None            None   None
Windows      6           Acting          4.2    4.1
Windows      6           AI              3.7    4.1
Windows      6           Gameplay        4.0    4.1

The main problem is having platform name on the platform column for corresponding observations. How could I get it?

Keyur Potdar · Accepted Answer · 2018-04-04 16:58:35Z

1

You can see that the row which has a new platform, has 3 columns, while others have 2. You can use that to change the platform.

You can see that rows like PlayStation have a column (<td> tag) with colspan="2" class="center" attributes. Use this to handle cases like PlayStation.

Code:

url_credit = "http://www.mobygames.com/game/wheelman/view-moby-score"
response = requests.get(url_credit, headers=headers)
soup = BeautifulSoup(response.text, "lxml")
table = soup.find("table", class_="reviewList table table-striped table-condensed table-hover").select('tr[valign="top"]')

platform = ''
total_votes, total_score = None, None
for row in table[1:]:
    # handle cases like playstation
    if row.find('td', colspan='2', class_='center'):
        platform = row.find('td').text
        total_score, total_votes = None, None
        print('{} | {} | {} | {} | {}'.format(platform, total_votes, None, None, total_score))
        continue

    cols = row.find_all('td')
    if len(cols) == 3:
        platform = cols[0].text
        total_votes = cols[1].text
        total_score = cols[2].text
        continue
    print('{} | {} | {} | {} | {}'.format(platform, total_votes, cols[0].text, cols[1].text, total_score))

Output:

PlayStation 3 | None | None | None | None
Windows | 6 |       Acting | 4.2 | 4.1
Windows | 6 |       AI | 3.7 | 4.1
Windows | 6 |       Gameplay | 4.0 | 4.1
Windows | 6 |       Graphics | 4.2 | 4.1
Windows | 6 |       Personal Slant | 4.3 | 4.1
Windows | 6 |       Sound / Music | 4.3 | 4.1
Windows | 6 |       Story / Presentation | 3.8 | 4.1
Xbox 360 | 5 |       Acting | 3.8 | 3.5
Xbox 360 | 5 |       AI | 3.2 | 3.5
Xbox 360 | 5 |       Gameplay | 3.4 | 3.5
Xbox 360 | 5 |       Graphics | 3.6 | 3.5
Xbox 360 | 5 |       Personal Slant | 3.6 | 3.5
Xbox 360 | 5 |       Sound / Music | 3.4 | 3.5
Xbox 360 | 5 |       Story / Presentation | 3.8 | 3.5

Note: By print, I mean save those values in whatever list/DataFrame you are using. I'm just using print() to show how to change the platform variable as and when needed.

edited Apr 4, 2018 at 16:58

answered Apr 4, 2018 at 13:22

Keyur Potdar

7,2386 gold badges27 silver badges40 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

edyvedy13 Over a year ago

Thank you very much, actually I asked simialr question yesterday but could not apply it to new data

edyvedy13 Over a year ago

Indeed, I need to scrape more than one pages, and there might be many cases like playstation, is there anyway that I can keep non values for PlayStation3 ?

Keyur Potdar Over a year ago

Can you give an example link where such a case like playstation occurs? It'll be easier to generalize after considering multiple cases.

Keyur Potdar Over a year ago

Have a look at the edit. If this doesn't work for any other page, please share that link.

Collectives™ on Stack Overflow

Scraping data from html table, selecting rows with certain attributes

1 Answer 1

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related