1

I've used the XML package successfully to scrape multiple websites, but I'm having trouble creating a data frame from this specific page:

library(XML)

url <- paste("http://www.foxsports.com/nfl/injuries?season=2013&seasonType=1&week=1", sep = "")
df1 <- readHTMLTable(url)

print(df1)

> print(df1)
$`NULL`
NULL

$`NULL`
NULL

$`NULL`
             Player Pos         Injury           Game Status
1       Dickson, Ed  TE          thigh              Probable
2      Jensen, Ryan   C           foot              Doubtful
3     Jones, Arthur  DE        illness                   Out
4   McPhee, Pernell  LB           knee              Probable
5     Pitta, Dennis  TE dislocated hip Injured Reserve (DFR)
6  Thompson, Deonte  WR           foot              Doubtful
7 Williams, Brandon  DT            toe              Doubtful

$`NULL`
           Player Pos        Injury Game Status
1  Anderson, C.J.  RB          knee         Out
2   Ayers, Robert  DE      Achilles    Probable
3   Bailey, Champ  CB          foot         Out
4     Clady, Ryan   T      shoulder    Probable
5  Dreessen, Joel  TE          knee         Out
6    Kuper, Chris   G         ankle    Doubtful
7 Osweiler, Brock  QB left shoulder    Probable
8     Welker, Wes  WR         ankle    Probable

$`NULL`

etc

If I try to coerce it I get this error:

> df1 <- data.frame(readHTMLTable(url))
Error in data.frame(`NULL` = NULL, `NULL` = NULL, `NULL` = list(Player = 1:7,  : 
  arguments imply differing number of rows: 0, 7, 8, 6, 9, 1, 11, 4, 12, 5, 21, 3, 2, 15

I'd like all of the injury data (PLAYER, POS, INJURY, GAME STATUS) for all of the teams.

Thanks in advance.

1
  • You are getting a list of tables because the page contains multiple tables. The first two are probably the heading and one is the table with no injuries on a team and so does not have the expected four columns... Commented Jul 31, 2014 at 15:14

2 Answers 2

2

You just need to remove the NULL elements and tables with 1 column listing "No injuries reported" and then rbind using do.call

n<-sapply(df1, function(x) !is.null(x) && ncol(x)==4)
x <-  do.call("rbind", df1[n])
rownames(x)<-NULL
Sign up to request clarification or add additional context in comments.

2 Comments

Note also this doesn't get you the team names, which are stored in <div> elements outside the tables. Scrape for <div class="wisfb_injuryHeader">
Thanks Chris, the sapply works perfectly. I've never used anything with <div> but I'm sure I can figure it out.
1
# Packages
require(XML)
require(RCurl)

# URL of interest
url <- paste("http://www.foxsports.com/nfl/injuries?season=2013&seasonType=1&week=1", sep = "")

# Parse HTML
doc <- htmlParse(url)

# Tables which are not nulls
df1 <- readHTMLTable(doc)
df.list <- df1[!as.vector(sapply(df1, is.null))]

# Get table names
table.names <- xpathSApply(doc, "//div[@class='wisfb_injuryHeader']", function(x) gsub("^\\s+|\\s+$", "", xmlValue(x)))

# Assign names
names(df.list) <- table.names


# $`San Diego Chargers`
# Player Pos                         Injury Game Status
# 1    Floyd, Malcom  WR                           knee    Probable
# 2   Ingram, Melvin  LB                  Torn left ACL  Day-to-Day
# 3    Liuget, Corey  DE                       shoulder    Probable
# 4  Patrick, Johnny  CB concussion, not injury related    Probable
# 5     Royal, Eddie  WR              chest, concussion    Probable
# 6  Taylor, Brandon   S                           knee    Probable
# 7      Te'o, Manti  LB                           foot         Out
# 8 Wright, Shareece  CB                          chest    Probable
# #[etc.]

EDIT: Just saw the @Spacedman said basically the same thing in one of the comments to the answer by @Chris S.

3 Comments

Hey all, I was running my code this morning and the readHTMLTable(url) from my code and the doc <- htmlParse(url) from @Tony no longer work (I get "Error in names(ans) = header : 'names' attribute [4] must be the same length as the vector [1]"). I'm assuming something changed on the website, but in terms of SO etiquette should I post a new question, or is asking it here cool? Thanks.
@FrankB. I just ran my code above and it worked fine. With respect to SO etiquette, I'm not sure. I'd ask it as a separate question, linking back to the original. I suppose the best place to ask about etiquette is over on meta: meta.stackoverflow.com
This is so odd. I just tried again and the same error. Obviously it used to work for me, and I'm copying/pasting my own code from here.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.