Convert multiple columns into single based on another column value python

Question

I'm trying to scrape http://www.basketball-reference.com/awards/all_league.html for some analysis and my objective is something like below

0 1st Marc Gasol 2014-2015
1 1st Anthony Davis 2014-2015
2 1st Lebron James 2014-2015
3 1st James Harden 2014-2015
4 1st Stephen Curry 2014-2015
5 2nd Paul Gasol 2014-2015 and so on

And this is the code I have so far, is there anyway to do this? Any suggestions/help much appreciated.

r = requests.get('http://www.basketball-reference.com/awards/all_league.html')
soup=BeautifulSoup(r.text.replace('&nbsp;','').replace('&gt;','').encode('ascii','ignore'),"html.parser")
all_league_data = pd.DataFrame(columns = ['year','team','player']) 


stw_list = soup.findAll('div', attrs={'class': 'stw'}) # Find all 'stw's'
for stw in stw_list:
    table = stw.find('table', attrs = {'class':'no_highlight stats_table'})
    for row in table.findAll('tr'):
        col = row.findAll('td')
        if col:
            year = col[0].find(text=True)
            team = col[2].find(text=True)
            player = col[3].find(text=True)
            all_league_data.loc[len(all_league_data)] = [team, player, year]
    all_league_data

damio · Accepted Answer · 2016-04-14 22:21:53Z

1

Looks like your code should work fine, but here's a working version without pandas:

import requests
from bs4 import BeautifulSoup

r = requests.get('http://www.basketball-reference.com/awards/all_league.html')
soup=BeautifulSoup(r.text.replace('&nbsp;','').replace('&gt;','').encode('ascii','ignore'),"html.parser")
all_league_data = []

stw_list = soup.findAll('div', attrs={'class': 'stw'}) # Find all 'stw's'
for stw in stw_list:
    table = stw.find('table', attrs = {'class':'no_highlight stats_table'})
    for row in table.findAll('tr'):
        col = row.findAll('td')
        if col:
            year = col[0].find(text=True)
            team = col[2].find(text=True)
            player = col[3].find(text=True)
            all_league_data.append([team, player, year])

for i, line in enumerate(all_league_data):
    print(i, *line)

answered Apr 14, 2016 at 22:21

damio

6,3613 gold badges42 silver badges58 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Padraic Cunningham · Accepted Answer · 2016-04-14 22:32:36Z

You are already using pandas so use read_html

import pandas as pd

all_league_data = pd.read_html('http://www.basketball-reference.com/awards/all_league.html')
print(all_league_data)

Which will give you all the table data in a dataframe:

  In [7]:  print(all_league_data[0].dropna().head(5))
         0    1    2                 3                   4  \
0  2014-15  NBA  1st      Marc Gasol C     Anthony Davis F   
1  2014-15  NBA  2nd       Pau Gasol C  DeMarcus Cousins C   
2  2014-15  NBA  3rd  DeAndre Jordan C        Tim Duncan F   
4  2013-14  NBA  1st     Joakim Noah C      LeBron James F   
5  2013-14  NBA  2nd   Dwight Howard C     Blake Griffin F   

                     5                6                    7  
0       LeBron James F   James Harden G      Stephen Curry G  
1  LaMarcus Aldridge F     Chris Paul G  Russell Westbrook G  
2      Blake Griffin F   Kyrie Irving G      Klay Thompson G  
4       Kevin Durant F   James Harden G         Chris Paul G  
5         Kevin Love F  Stephen Curry G        Tony Parker G

It will be trivial to rearrange however you like or drop certain columns, read_html takes a few args like attrs which you can also apply, it is all in the link.

Collectives™ on Stack Overflow

Convert multiple columns into single based on another column value python

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related