2

In the past, I have been able to use readHTMLTable in R to pull some football stats. When trying to do so again this year, the tables aren't showing up, even though they are visible on the webpage. Here is an example: http://www.pro-football-reference.com/boxscores/201609080den.htm

When I view the source for the page, the tables are all commented out (which I suspect is why readHTMLTable didn't find them).

Example: search for "team_stats" in source code...

    <!--  
    <div class="table_outer_container">
    <div class="overthrow table_container" id="div_team_stats">
    <table class="stats_table" id="team_stats" data-cols-to-  freeze=1><caption>Team Stats Table</caption>

Questions:

How can the table be commented out in the source yet display in the browser?

Is there a way to read the commented out tables using readHTMLTable (or some other method)?

3
  • perhaps—in the raw text before parsing—gsub-out the <!-- and -->? Commented Sep 9, 2016 at 22:04
  • 1
    If it's commented out, it's no longer a table, just incidental text. Commented Sep 9, 2016 at 22:08
  • I also don't think the tables you think are commented out are actually commented out. Commented Sep 9, 2016 at 22:13

1 Answer 1

7

You can, in fact, grab it if you use the XPath comment() selector:

library(rvest)

url <- 'http://www.pro-football-reference.com/boxscores/201609080den.htm'

url %>% read_html() %>%                   # parse html
    html_nodes('#all_team_stats') %>%     # select node with comment
    html_nodes(xpath = 'comment()') %>%   # select comments within node
    html_text() %>%                       # return contents as text
    read_html() %>%                       # parse text as html
    html_node('table') %>%                # select table node
    html_table()                          # parse table and return data.frame

##                                 CAR           DEN
## 1         First Downs            21            21
## 2        Rush-Yds-TDs      32-157-1      29-148-2
## 3   Cmp-Att-Yd-TD-INT 18-33-194-1-1 18-26-178-1-2
## 4        Sacked-Yards          3-18          2-19
## 5      Net Pass Yards           176           159
## 6         Total Yards           333           307
## 7        Fumbles-Lost           0-0           1-1
## 8           Turnovers             1             3
## 9     Penalties-Yards          8-85          4-22
## 10   Third Down Conv.          9-15          5-10
## 11  Fourth Down Conv.           0-0           1-1
## 12 Time of Possession         32:19         27:41
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.