How to scrape other fbref.com table? [duplicate]

Question

I am trying to get the player standard stats table from this page (https://fbref.com/en/comps/9/stats/Premier-League-Stats) but when I ask it to get me the tables only the squad standard stats tables are pulled. Is there something I am missing? (my code is below)

import pandas as pd

url = "https://fbref.com/en/comps/9/stats/Premier-League-Stats"

tables = pd.read_html(url)
print(len(tables))
print(tables[0])
print(tables[1])

The table you are trying to read in pd.read_html is not loaded when you request the page. If it was you could use tables = pd.read_html(url, attrs = {'id': 'stats_standard'}) to target that specific table by id. — Captain Caveman
– Captain Caveman, Commented May 9, 2023 at 23:00

jqurious · Accepted Answer · 2023-05-10 11:25:22Z

In cases like this, it's usually helpful to fetch the html using Python, save it locally, and use your editor to inspect/search it as it may differ to what you get in your "web browser".

import requests
from   pathlib import Path

url = "https://fbref.com/en/comps/9/stats/Premier-League-Stats"

r = requests.get(url)

# save to local file 
Path("stats.html").write_bytes(r.content)

Searching the file for the player table in my editor brings me to line 1066.

Notice the <!-- on line 1064 - this means the table is actually commented out, which is why pandas does not "see" it.

1064 <!--¬
1065 ¬
1066 <div class="table_container" id="div_stats_standard">¬                                                           1067 ¬

When you open the webpage in your "web browser", the tables are uncommented with javascript.

You can extract the commented out data "manually" and pass it to pandas:

import bs4
import requests
import pandas as pd

url = "https://fbref.com/en/comps/9/stats/Premier-League-Stats"

r = requests.get(url)
soup = bs4.BeautifulSoup(r.content, "html.parser")

table_id = """<div class="table_container" id="div_stats_standard">"""
table = (
   soup.find(attrs={"data-label": "Player Standard Stats"})
       .find_next(string=lambda tag: 
          isinstance(tag, bs4.element.Comment) and table_id in tag
   )
)

player_stats = pd.read_html(table)[0]

>>> player_stats
    Unnamed: 0_level_0   Unnamed: 1_level_0 Unnamed: 2_level_0  ... Per 90 Minutes          Unnamed: 36_level_0
                    Rk               Player             Nation  ...           npxG npxG+xAG             Matches
0                    1     Brenden Aaronson             us USA  ...           0.14     0.31             Matches
1                    2            Che Adams            sct SCO  ...           0.30     0.42             Matches
2                    3          Tyler Adams             us USA  ...           0.00     0.06             Matches
3                    4     Tosin Adarabioyo            eng ENG  ...           0.03     0.05             Matches
4                    5         Nayef Aguerd             ma MAR  ...           0.14     0.17             Matches
..                 ...                  ...                ...  ...            ...      ...                 ...
574                553        Jordan Zemura             zw ZIM  ...           0.02     0.14             Matches
575                554  Oleksandr Zinchenko             ua UKR  ...           0.06     0.12             Matches
576                555         Hakim Ziyech             ma MAR  ...           0.08     0.35             Matches
577                556           Kurt Zouma             fr FRA  ...           0.10     0.11             Matches
578                557      Martin Ødegaard             no NOR  ...           0.31     0.54             Matches

[579 rows x 37 columns]

Collectives™ on Stack Overflow

How to scrape other fbref.com table? [duplicate]

1 Answer 1

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Linked

Related