I'm trying to scrape http://www.basketball-reference.com/awards/all_league.html for some analysis and my objective is something like below
0 1st Marc Gasol 2014-2015
1 1st Anthony Davis 2014-2015
2 1st Lebron James 2014-2015
3 1st James Harden 2014-2015
4 1st Stephen Curry 2014-2015
5 2nd Paul Gasol 2014-2015 and so on
And this is the code I have so far, is there anyway to do this? Any suggestions/help much appreciated.
r = requests.get('http://www.basketball-reference.com/awards/all_league.html')
soup=BeautifulSoup(r.text.replace(' ','').replace('>','').encode('ascii','ignore'),"html.parser")
all_league_data = pd.DataFrame(columns = ['year','team','player'])
stw_list = soup.findAll('div', attrs={'class': 'stw'}) # Find all 'stw's'
for stw in stw_list:
table = stw.find('table', attrs = {'class':'no_highlight stats_table'})
for row in table.findAll('tr'):
col = row.findAll('td')
if col:
year = col[0].find(text=True)
team = col[2].find(text=True)
player = col[3].find(text=True)
all_league_data.loc[len(all_league_data)] = [team, player, year]
all_league_data