0

web page: https://fbref.com/en/comps/9/gca/Premier-League-Stats

I have scraped the top table and I'm now attempting to scrape the second.

import requests
from bs4 import BeautifulSoup

URL = 'https://fbref.com/en/comps/9/gca/Premier-League-Stats'
page = requests.get(URL)


soup = BeautifulSoup(page.content, 'html.parser')


stepa= soup.find(id="all_stats_gca")

the above works fine but then i cannot go any further? I would have thought the next step would be

stepb=stepa.find("div",{"class":"table_outer_container"})

but when printing this returns none. any other suggestions?

8
  • 1
    After quickly checking the source code of that page, I didn't see any div with a class named table_outer_container inside of the div with the id all_stats_gca Commented May 11, 2020 at 21:49
  • Maybe you want the div with the id all_stats_gca_squads Commented May 11, 2020 at 21:50
  • Pretty sure it's there though a little way down, and no I used all_stats_gca_squads for the first table i scraped @revliscano Commented May 11, 2020 at 22:05
  • Oh yes, right. The problem is that the content you're interested in is commented. I checked that they add a class named commented to that div. They must be doing that as a way of protecting their data. You can see this by opening the source code (CTRL + U) instead of inspecting the elements in the devtools. Commented May 11, 2020 at 22:26
  • Yes, I confirmed that they have a function in their js file to show the commented content. Nice protection from them, I must say. Will have it in mind for the future Commented May 11, 2020 at 22:27

1 Answer 1

0

As I said in the comments, the problem with the page that you're trying to parse is that they commented the div with the class table_outer_container, therefore you are getting None when you call the find() method. (that commented div is being ignored from the resultset of stepa).

Now, (based on this answer) as a workaround you can do something as follows to get that commented div:

stepb = stepa.find_all(string=lambda text: isinstance(text, Comment))
comment_content = stepb[0].extract().replace('\n', ' ').replace('\t', ' ')
new_soup = BeautifulSoup(comment_content, 'html.parser')

table_outer_container = new_soup.find("div",{"class":"table_outer_container"})
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.