.find() returning none when web scraping - BeautfiulSoup Python

Question

web page: https://fbref.com/en/comps/9/gca/Premier-League-Stats

I have scraped the top table and I'm now attempting to scrape the second.

import requests
from bs4 import BeautifulSoup

URL = 'https://fbref.com/en/comps/9/gca/Premier-League-Stats'
page = requests.get(URL)


soup = BeautifulSoup(page.content, 'html.parser')


stepa= soup.find(id="all_stats_gca")

the above works fine but then i cannot go any further? I would have thought the next step would be

stepb=stepa.find("div",{"class":"table_outer_container"})

but when printing this returns none. any other suggestions?

After quickly checking the source code of that page, I didn't see any div with a class named table_outer_container inside of the div with the id all_stats_gca — revliscano
– revliscano, Commented May 11, 2020 at 21:49
Pretty sure it's there though a little way down, and no I used all_stats_gca_squads for the first table i scraped @revliscano — Nenny Dunnazz
– Nenny Dunnazz, Commented May 11, 2020 at 22:05
Oh yes, right. The problem is that the content you're interested in is commented. I checked that they add a class named commented to that div. They must be doing that as a way of protecting their data. You can see this by opening the source code (CTRL + U) instead of inspecting the elements in the devtools. — revliscano
– revliscano, Commented May 11, 2020 at 22:26
Yes, I confirmed that they have a function in their js file to show the commented content. Nice protection from them, I must say. Will have it in mind for the future — revliscano
– revliscano, Commented May 11, 2020 at 22:27

revliscano · Accepted Answer · 2020-05-11 23:13:38Z

0

As I said in the comments, the problem with the page that you're trying to parse is that they commented the div with the class table_outer_container, therefore you are getting None when you call the find() method. (that commented div is being ignored from the resultset of stepa).

Now, (based on this answer) as a workaround you can do something as follows to get that commented div:

stepb = stepa.find_all(string=lambda text: isinstance(text, Comment))
comment_content = stepb[0].extract().replace('\n', ' ').replace('\t', ' ')
new_soup = BeautifulSoup(comment_content, 'html.parser')

table_outer_container = new_soup.find("div",{"class":"table_outer_container"})

answered May 11, 2020 at 23:13

revliscano

2,2722 gold badges14 silver badges23 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

.find() returning none when web scraping - BeautfiulSoup Python

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related