2

I'm trying to scrape the following website for MLB draft data:

https://www.baseballamerica.com/draft-history/mlb-draft-database/#/

The issue is that I can't seem to find the correct class to input into rvest::html_nodes() in order to isolate the table. Using Chrome's "Inspect" tool, I've tried each of the classes that would seemingly identify the table:


library(tidyverse)
library(rvest)

url <- "https://www.baseballamerica.com/draft-history/mlb-draft-database/#/"

url %>% 
  read_html() %>% 
  html_nodes("table-container")

I've also tried "search-table draft-search-table", but I keep getting the same results: "{xml_nodeset (0)}". Any help would be greatly, greatly appreciated!

1
  • 2
    The table is probably loaded after the page loads with javascript. rvest will only see the data from the "sources" tab, it may not see everything the "elements" tab will show. If you need to run the javascript on a page, you'll have to use a package like Rselenium. Commented Oct 11, 2019 at 15:08

1 Answer 1

2

Content is loaded dynamically from an API call returning json. You can use httr POST request to the API for the info

library(httr)

headers = c('Content-Type'='application/json')
data='{"SigningBonusMin":"0","SigningBonusMax":"0","Year":"2019","Round":"1","TeamId":"0","FourYearSchoolType":"false","JuniorCollegeType":"false","HighSchoolType":"false","OtherSchoolType":"false","OverallNumber":"0","pageId":"1","paid":"false"}'
r <- content(httr::POST(url = 'https://www.baseballamerica.com/umbraco/api/draftdatabaseapi/advancedsearch', httr::add_headers(.headers=headers), body = data, encode = "json"))$Results
print(r)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.