Web scraping 2nd table player stats from fbref.com [duplicate]

Question

Was hoping for help here. I'm trying to web scrape this second table of player goal and shot creation stats on FB Ref for the MLS, but my script is bringing in the first table of team statistics instead. Can someone help? Code below.

URL: https://fbref.com/en/comps/22/stats/Major-League-Soccer-Stats

# libraries
import pandas as pd

# fbref table link
url_df = 'https://fbref.com/en/comps/22/gca/Major-League-Soccer-Stats'


df = pd.read_html(url_df)

df

There is a difference between the URL before the code block and the URL in the code block (stats vs gca) but the URL in the code block looks like the one that you want. It also looks like the HTML page has more than two tables but some tables are hidden by scripts. Still, you want one of those tables and it's not the first one. Did you mean to post more code than that last df line? — SmellyCat
– SmellyCat, Commented May 14, 2024 at 4:34
Does this answer your question? How to extract hidden table from fbref website by id? — HedgeHog
– HedgeHog, Commented May 14, 2024 at 5:34

datawookie · Accepted Answer · 2024-05-14 04:48:24Z

You need to work a little harder here. The table you are after is actually in a comment in the static page. So simply using pd.read_html() will not get it for you.

But you can manually extract the relevant comment and then parse that.

import requests
from bs4 import BeautifulSoup, Comment
import pandas as pd
from io import StringIO

url = 'https://fbref.com/en/comps/22/gca/Major-League-Soccer-Stats'

response = requests.get(url)
html_content = response.text

soup = BeautifulSoup(html_content, 'html.parser')

comments = soup.find_all(string=lambda text: isinstance(text, Comment))

for comment in comments:
    # Check if the comment contains the target <div>.
    if 'div_stats_gca' in comment:
        # Parse the comment as HTML and extract target <div>.
        div = BeautifulSoup(comment, 'html.parser').find(id='div_stats_gca')

        df = pd.read_html(StringIO(str(div)))[0]
        print(df.iloc[:, :6])
else:
    print("Unable to find table.")

Only printing the first seven columns but can confirm that they are all there.

These are the packages that I'm using:

lxml==5.2.2
pandas==2.2.2
beautifulsoup4==4.12.3
requests==2.31.0

Collectives™ on Stack Overflow

Web scraping 2nd table player stats from fbref.com [duplicate]

1 Answer 1

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Linked

Related