1

I'm trying to get a data table off of a website using the RCurl package. My code works successfully for the URL that you get to by clicking through the website:

http://statsheet.com/mcb/teams/air-force/game_stats/

Once you try to select previous years (which I want); my code no longer works.

Example link: http://statsheet.com/mcb/teams/air-force/game_stats?season=2012-2013

I'm guessing this has something to do with the reserved symbol(s) in the year specific address. I've tried URLencode as well as manually encoding the address but that hasn't worked either.

My code:

library(RCurl)
library(XML)

#Define URL
theurl <-URLencode("http://statsheet.com/mcb/teams/air-force/game_stats?season=2012-    
2013", reserved=TRUE)

webpage <- getURL(theurl)
webpage <- readLines(tc <- textConnection(webpage)); close(tc)

pagetree <- htmlTreeParse(webpage, error=function(...){}, useInternalNodes = TRUE)

# Extract table header and contents
tablehead <- xpathSApply(pagetree, "//*/table[1]/thead[1]/tr[2]/th", xmlValue)
results <- xpathSApply(pagetree,"//*/table[1]/tbody/tr/td", xmlValue)

content <- as.data.frame(matrix(results, ncol = 19, byrow = TRUE))

testtablehead <- c("W/L","Opponent",tablehead[c(2:18)])
names(content) <- testtablehead

The relevant error that R returns:

Error in function (type, msg, asError = TRUE)  : 
Could not resolve host: http%3a%2f%2fstatsheet.com%2fmcb%2fteams%2fair-  
force%2fgame_stats%3fseason%3d2012-2013; No data record of requested type

Does anyone have an idea what the problem is and how to fix it?

2
  • Using reserved=FALSE gives the same error? Commented Feb 12, 2014 at 14:51
  • Putting reserved=FALSE causes R to get hung-up while executing the geturl command Update: It actually ran out that time but gave the error: Error in matrix(results, ncol = 19, byrow = TRUE) : 'data' must be of a vector type, was 'NULL' Commented Feb 12, 2014 at 14:55

1 Answer 1

1

Skip the unneeded encoding and download of the url:

library(XML)
url <- "http://statsheet.com/mcb/teams/air-force/game_stats?season=2012-2013"

pagetree <- htmlTreeParse(url, useInternalNodes = TRUE)
Sign up to request clarification or add additional context in comments.

3 Comments

That gives me the error Error in UseMethod("xmlNamespaceDefinitions") : no applicable method for 'xmlNamespaceDefinitions' applied to an object of class "NULL" Any ideas?
I don't get that error. So it may be a bug in XML, or one of us doesn't have the latest version XML.
I don't get that error either. I am using XML ‘3.98.1.1’ and "R version 3.0.2 Patched (2013-11-25 r64299)" on Windows 8.1 .

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.