1

I'd like to scrape team advanced stats from stats.nba.com.

My current code to get the XHR file where the data is stored is :

library(httr)
library(jsonlite)


nba <- GET('https://stats.nba.com/stats/leaguedashteamstats?Conference=&DateFrom=11%2F12%2F2019&DateTo=&Division=&GameScope=&GameSegment=&LastNGames=0&LeagueID=00&Location=&MeasureType=Advanced&Month=0&OpponentTeamID=0&Outcome=&PORound=0&PaceAdjust=N&PerMode=PerGame&Period=0&PlayerExperience=&PlayerPosition=&PlusMinus=N&Rank=N&Season=2019-20&SeasonSegment=&SeasonType=Regular+Season&ShotClockRange=&StarterBench=&TeamID=0&TwoWay=0&VsConference=&VsDivision=')

I get the URL via these steps in Chrome: Inspect -> Network -> XHR

The code throws this error:

Error in curl::curl_fetch_memory(url, handle = handle) : 
  LibreSSL SSL_read: SSL_ERROR_SYSCALL, errno 60

I also tried it with custom advanced filters on the website which either result in the same error or the code running forever. I'm not that great in web scraping so I would appreciate if anyone can point out what the issue is here.

1 Answer 1

2

I have had a good look at this. It looks like this site goes to some lengths to prevent scraping, and won't give you the json from that url unless you provide it with cookies that are generated by a back-and-forth between your browser's javascript and their own servers. They also monitor request timings with New Relic technology and are therefore likely to block your IP if you scrape multiple pages. It wouldn't be impossible, but very, very hard.

If you are desperate for the data you could look into using the NBA API which requires a sign-up but us free to use for 1000 requests per day.

The other option is to automate a browser using RSelenium to get the html of the fully rendered pages.

Of course, if you only want this one page, you can just copy the html from your Chrome's inspector, then use rvest::read_html(readClipboard())

Sign up to request clarification or add additional context in comments.

1 Comment

I really appreciate you taking the time to look into it. I am now looking at a repo called nba_api for python. It looks promising thus far and I think I'll be able to make it work (I'm less proficient in Python so it might take some time).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.