1

How would I use bs4 to get the "Per Game Stats" table on here to turn it into a pandas dataframe?

I have already tried

url = 'https://www.basketball-reference.com/leagues/NBA_2021.html'
page = requests.get(url)
page
soup = BeautifulSoup(page.content, 'html.parser')
print(soup.prettify())

and am stuck from there.

Thanks.

2 Answers 2

1

Use pd.read_html:

import requests
from bs4 import BeautifulSoup
import pandas as pd

url = 'https://www.basketball-reference.com/leagues/NBA_2021.html'
page = requests.get(url)
soup = BeautifulSoup(page.content, 'html.parser')
table = soup.find('table', id='per_game-team')
df = pd.read_html(str(table))[0]

The table you want has the id 'per_game-team'. Use the inspector from your browser's developer tools to find it.

Output:

>>> df.head(10)
     Rk                     Team   G     MP  ...  BLK   TOV    PF    PTS
0   1.0         Milwaukee Bucks*  72  240.7  ...  4.6  13.8  17.3  120.1
1   2.0           Brooklyn Nets*  72  241.7  ...  5.3  13.5  19.0  118.6
2   3.0      Washington Wizards*  72  241.7  ...  4.1  14.4  21.6  116.6
3   4.0               Utah Jazz*  72  241.0  ...  5.2  14.2  18.5  116.4
4   5.0  Portland Trail Blazers*  72  240.3  ...  5.0  11.1  18.9  116.1
5   6.0            Phoenix Suns*  72  242.8  ...  4.3  12.5  19.1  115.3
6   7.0           Indiana Pacers  72  242.4  ...  6.4  13.5  20.2  115.3
7   8.0          Denver Nuggets*  72  242.8  ...  4.5  13.5  19.1  115.1
8   9.0     New Orleans Pelicans  72  242.1  ...  4.4  14.6  18.0  114.6
9  10.0    Los Angeles Clippers*  72  240.0  ...  4.1  13.2  19.2  114.0

[10 rows x 25 columns]
Sign up to request clarification or add additional context in comments.

Comments

1

pandas's .read_html() is the way to go here (as it uses BeautifulSoup under the hood). And since it already incorporates requests with it, you can actually simplify the solution Corral provided as simply:

import pandas as pd

url = 'https://www.basketball-reference.com/leagues/NBA_2021.html'
df = pd.read_html(url, attrs = {'id': 'per_game-team'})[0]

But since you are specifically asking how to convert to dataframe with bs4, I'll provide that solution.

The basic logic/steps to do this are:

  1. Get the table tag
  2. From the table object, Get the Header names from <th> tags under the <thead> tag
  3. iterate through the rows (<tr> tags) and get the <td> content from each row

Code:

import pandas as pd
import requests
from bs4 import BeautifulSoup

url = 'https://www.basketball-reference.com/leagues/NBA_2021.html'
response = requests.get(url)

soup = BeautifulSoup(response.text, 'html.parser')
table = soup.find('table', {'id':'per_game-team'})

headers = [x.text for x in table.find('thead').find_all('th')]

data = []
table_body_rows = table.find('tbody').find_all('tr')
for row in table_body_rows:
    rank = [row.find('th').text]
    row_data = rank + [x.text for x in row.find_all('td')]
    data.append(row_data)


df = pd.DataFrame(data, columns=headers)

Output:

print(df)
    Rk                     Team   G     MP    FG  ...  STL  BLK   TOV    PF    PTS
0    1         Milwaukee Bucks*  72  240.7  44.7  ...  8.1  4.6  13.8  17.3  120.1
1    2           Brooklyn Nets*  72  241.7  43.1  ...  6.7  5.3  13.5  19.0  118.6
2    3      Washington Wizards*  72  241.7  43.2  ...  7.3  4.1  14.4  21.6  116.6
3    4               Utah Jazz*  72  241.0  41.3  ...  6.6  5.2  14.2  18.5  116.4
4    5  Portland Trail Blazers*  72  240.3  41.3  ...  6.9  5.0  11.1  18.9  116.1
5    6            Phoenix Suns*  72  242.8  43.3  ...  7.2  4.3  12.5  19.1  115.3
6    7           Indiana Pacers  72  242.4  43.3  ...  8.5  6.4  13.5  20.2  115.3
7    8          Denver Nuggets*  72  242.8  43.3  ...  8.1  4.5  13.5  19.1  115.1
8    9     New Orleans Pelicans  72  242.1  42.5  ...  7.6  4.4  14.6  18.0  114.6
9   10    Los Angeles Clippers*  72  240.0  41.8  ...  7.1  4.1  13.2  19.2  114.0
10  11           Atlanta Hawks*  72  241.7  40.8  ...  7.0  4.8  13.2  19.3  113.7
11  12         Sacramento Kings  72  240.3  42.6  ...  7.5  5.0  13.4  19.4  113.7
12  13    Golden State Warriors  72  240.3  41.3  ...  8.2  4.8  15.0  21.2  113.7
13  14      Philadelphia 76ers*  72  242.1  41.4  ...  9.1  6.2  14.4  20.2  113.6
14  15       Memphis Grizzlies*  72  241.7  42.8  ...  9.1  5.1  13.3  18.7  113.3
15  16          Boston Celtics*  72  241.4  41.5  ...  7.7  5.3  14.1  20.4  112.6
16  17        Dallas Mavericks*  72  240.3  41.1  ...  6.3  4.3  12.1  19.4  112.4
17  18   Minnesota Timberwolves  72  241.7  40.7  ...  8.8  5.5  14.3  20.9  112.1
18  19          Toronto Raptors  72  240.3  39.7  ...  8.6  5.4  13.2  21.2  111.3
19  20        San Antonio Spurs  72  242.8  41.9  ...  7.0  5.1  11.4  18.0  111.1
20  21            Chicago Bulls  72  241.4  42.2  ...  6.7  4.2  15.1  18.9  110.7
21  22      Los Angeles Lakers*  72  242.4  40.6  ...  7.8  5.4  15.2  19.1  109.5
22  23        Charlotte Hornets  72  241.0  39.9  ...  7.8  4.8  14.8  18.0  109.5
23  24          Houston Rockets  72  240.3  39.3  ...  7.6  5.0  14.7  19.5  108.8
24  25              Miami Heat*  72  241.4  39.2  ...  7.9  4.0  14.1  18.9  108.1
25  26         New York Knicks*  72  242.1  39.4  ...  7.0  5.1  12.9  20.5  107.0
26  27          Detroit Pistons  72  242.1  38.7  ...  7.4  5.2  14.9  20.5  106.6
27  28    Oklahoma City Thunder  72  241.0  38.8  ...  7.0  4.4  16.1  18.1  105.0
28  29            Orlando Magic  72  240.7  38.3  ...  6.9  4.4  12.8  17.2  104.0
29  30      Cleveland Cavaliers  72  242.1  38.6  ...  7.8  4.5  15.5  18.2  103.8

[30 rows x 25 columns]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.