0

I am trying to scrape this site: https://www.basketball-reference.com/players/a/

My end goal is to build a dataframe of that table, along with the a new column that includes the players index. For example, for the top player this would be abdelal01.

My current attempt:

url = "https://www.basketball-reference.com/players/a"
# this is the HTML from the given URL
html = urlopen(url)
soup = BeautifulSoup(html)

headers = [th.getText() for th in soup.findAll('tr')[0].findAll('th')]
headers = headers

rows = soup.findAll('tr')

player_names = [[td.getText() for td in rows[i].findAll('th')]
            for i in range(len(rows))]



names = pd.DataFrame(player_names, columns = headers)
names.head(10)

player_stats = [[td.getText() for td in rows[i].findAll('td')]
            for i in range(len(rows))]


stats = pd.DataFrame(player_stats, columns = headers[1:])
stats['Player'] = names['Player']

Essentially this completely rebuilds the table, but without the URL to the player. Is there an easier way to do this instead of building two dataframes given that in html they have different reference points?

And what is the best way to collect the index to the player?

1 Answer 1

1

The simplest way to extract table data is through the pandas package. Which can be then manipulated easily.

The read_html() method grabs any table data from a page.

import pandas as pd
df = pd.read_html('https://www.basketball-reference.com/players/a/')[0]
df

Output

          Player    From    To      Pos Ht      Wt  Birth Date  Colleges
0   Alaa Abdelnaby  1991    1995    F-C 6-10    240 June 24, 1968   Duke
1   Zaid Abdul-Aziz 1969    1978    C-F 6-9 235 April 7, 1946   Iowa State
2   Kareem Abdul-Jabbar*    1970    1989    C   7-2 225 April 16, 1947  UCLA
3   Mahmoud Abdul-Rauf  1991    2001    G   6-1 162 March 9, 1969   LSU
4   Tariq Abdul-Wahad   1998    2003    F   6-6 223 November 3, 1974    Michigan, San Jose State
... ... ... ... ... ... ... ... ...
161 Dennis Awtrey   1971    1982    C   6-10    235 February 22, 1948   Santa Clara
162 Gustavo Ayón    2012    2014    C   6-10    250 April 1, 1985   NaN
163 Jeff Ayres  2010    2016    F   6-9 240 April 29, 1987  Arizona State
164 Deandre Ayton   2019    2020    C   6-11    250 July 23, 1998   Arizona
165 Kelenna Azubuike    2007    2012    G   6-5 220 December 16, 1983   Kentucky

Players Table

df['players']

Output

0            Alaa Abdelnaby
1           Zaid Abdul-Aziz
2      Kareem Abdul-Jabbar*
3        Mahmoud Abdul-Rauf
4         Tariq Abdul-Wahad
               ...         
161           Dennis Awtrey
162            Gustavo Ayón
163              Jeff Ayres
164           Deandre Ayton
165        Kelenna Azubuike
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.