NFL Web Scraper HELP: NetworkError: Failed to execute 'send' on 'XMLHttpRequest': Failed to load

Question

I am new to coding and need some assistance. I am trying to make a web scraper for a project that involves scraping NFL roster data from 2000 to 2023 but am getting an error requesting the html. I am using Jupyter labs (Python-Pyodide) to write my code and this is the only code I have:

import requests
from bs4 import BeautifulSoup
import pandas as pd
from io import StringIO

years = list(range(2000, 2024))
url = 'https://www.footballdb.com/teams/nfl/arizona-cardinals/roster/2023'
data = requests.get(url)

This is the error I'm getting:

(JsException: NetworkError: Failed to execute 'send' on 'XMLHttpRequest': Failed to load 'https://www.footballdb.com/teams/nfl/arizona-cardinals/roster/2023'.)

Can you explain why I am getting this error and how do i fix it?

Sergey K · Accepted Answer · 2024-10-30 06:55:46Z

0

You didn't specify the request headers. But this page doesnt have table tags, so u cant use pd.read_html

import requests
from bs4 import BeautifulSoup
import pandas as pd


url = "https://www.footballdb.com/teams/nfl/arizona-cardinals/roster/2023"
headers = {
  'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7',
  'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/130.0.0.0 Safari/537.36'
}
result = []
response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.text, 'lxml')
table = soup.find('div', class_='divtable divtable-striped divtable-mobile')
table_head = [head.get_text() for head in table.find('div', class_='thead')]
for s in table.find_all('span', class_='visible-xs-inline'):
    s.extract()
for row in table.find_all('div', class_='tr'):
    result.append(dict(zip(table_head, [cell.get_text() for cell in row.find_all('div', class_='td')])))
df = pd.DataFrame(result)
print(df)

OUTPUT:

     #            Player Pos   G  GS Age            College
0   82   Andre Baccellia  WR   5   0  26         Washington
1    3       Budda Baker  DB  12  12  27         Washington
2   96        Eric Banks  DE   2   0  25  Texas-San Antonio
3   51       Krys Barnes  LB  16   6  25               UCLA
4   66    Jackson Barton  OT   1   0  28               Utah
..  ..               ...  ..  ..  ..  ..                ...
73  21  Garrett Williams  DB   9   6  22           Syracuse
74  27     Divaad Wilson  DB   2   1  23    Central Florida
75  20      Marco Wilson  DB  15  11  24            Florida
76  14    Michael Wilson  WR  13  12  23           Stanford
77  10        Josh Woods  LB  11   7  27           Maryland

answered Oct 30, 2024 at 6:55

Sergey K

1,6871 gold badge9 silver badges16 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

Raul Ojeda Over a year ago

I just tried this on my end but it still did not work. I still got error such as JSException, _RequestError, HTTPException, ProtocolError, and ConnectionError. Do i have to change the 'accept': ...' part on my end? Or is there some other reason I am getting these errors?

Wayne Over a year ago

@RaulOjeda Then why did you mark it as accepted? @Sergey - And I am seeing AttributeError: 'NoneType' object has no attribute 'find' in regards to the line table_head = [head.get_text() for head in table.find('div', class_='thead')]. Importantly, at present the code given in this answer won't work where the OP specified: " I am using Jupyter labs (Python-Pyodide) to write my code ". The network ability of JupyterLite is restricted by security in the browser. You cannot directly translate what works for an ipykernel to a pyodide-based kernel at this time, without accommodations.

Sergey K Over a year ago

@Wayne my bad, i used simple linux terminal for test, I didn't see the OP ask about Jupyter labs

Wayne Over a year ago

Understandable. Not just typical JupyterLab, they specifically meant JupyterLite which has a JupyterLab flavor.

Raul Ojeda Dec 2, 2024 at 4:57

@Wayne How do you make those accommodations on the browser? Or is it easier to just do i from the desktop version?

|

nabil.adnan1610 · Accepted Answer · 2024-10-30 07:08:38Z

You need to send headers with your get request. Specifically User-Agent. When you send this value it mocks as if the request comes from a browser e.g. a real person and not a bot/scraper. You can find this value easily by Googling "what is my user agent". Copy that entire thing; you will need it in a minute.

Declare a dict using the value you copied:

my_headers = {
    "User-Agent": "<YOUR_VALUE>"
}

Pass headers as an argument in the get method:

my_url = "https://www.footballdb.com/teams/nfl/arizona-cardinals/roster/2023"
data = requests.get(url=my_url, headers=my_headers)
print(data.content) # just to confirm you got the response back

Here is the scenic route to get your User-Agent and see what values are/could be there in "headers", if you're interested:

Hit F12 on your keyboard when viewing this page. The developer tools will open up.
Navigate to the "Network" tab
Choose "All"
If you don't see anything, no worries; just refresh the page
Click on an item, you will see another section pop up
Click on "Headers" and scroll down until you find "User-Agent"

Collectives™ on Stack Overflow

NFL Web Scraper HELP: NetworkError: Failed to execute 'send' on 'XMLHttpRequest': Failed to load

2 Answers 2

6 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

6 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related