-1

Was hoping for help here. I'm trying to web scrape this second table of player goal and shot creation stats on FB Ref for the MLS, but my script is bringing in the first table of team statistics instead. Can someone help? Code below.

URL: https://fbref.com/en/comps/22/stats/Major-League-Soccer-Stats

# libraries
import pandas as pd

# fbref table link
url_df = 'https://fbref.com/en/comps/22/gca/Major-League-Soccer-Stats'


df = pd.read_html(url_df)

df
2
  • 1
    There is a difference between the URL before the code block and the URL in the code block (stats vs gca) but the URL in the code block looks like the one that you want. It also looks like the HTML page has more than two tables but some tables are hidden by scripts. Still, you want one of those tables and it's not the first one. Did you mean to post more code than that last df line? Commented May 14, 2024 at 4:34
  • 1
    Does this answer your question? How to extract hidden table from fbref website by id? Commented May 14, 2024 at 5:34

1 Answer 1

2

You need to work a little harder here. The table you are after is actually in a comment in the static page. So simply using pd.read_html() will not get it for you.

But you can manually extract the relevant comment and then parse that.

import requests
from bs4 import BeautifulSoup, Comment
import pandas as pd
from io import StringIO

url = 'https://fbref.com/en/comps/22/gca/Major-League-Soccer-Stats'

response = requests.get(url)
html_content = response.text

soup = BeautifulSoup(html_content, 'html.parser')

comments = soup.find_all(string=lambda text: isinstance(text, Comment))

for comment in comments:
    # Check if the comment contains the target <div>.
    if 'div_stats_gca' in comment:
        # Parse the comment as HTML and extract target <div>.
        div = BeautifulSoup(comment, 'html.parser').find(id='div_stats_gca')

        df = pd.read_html(StringIO(str(div)))[0]
        print(df.iloc[:, :6])
else:
    print("Unable to find table.")

Only printing the first seven columns but can confirm that they are all there.

enter image description here

These are the packages that I'm using:

lxml==5.2.2
pandas==2.2.2
beautifulsoup4==4.12.3
requests==2.31.0
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.