1

I am new to web scraping in R and have recently ran into a problem with sites that reference javascript. I am attempting to scrape the data from a web page below and have been unsuccessful. I believe that the javascript links prevent me from accessing the table. As a result the R package "XML" with function "readHTMLTable" comes up null.

library(XML)
library(RCurl)
url <- "http://votingrights.news21.com/interactive/movement-voter-id/index.html"
tabs <- getURL(url)
tabs <- htmlParse(url)
tabs <- readHTMLTable(tabs, stringsAsFactors = FALSE)

How can I access the javascript links to get to the data? Or is this even possible? When using the direct link to the data (below) and the R package "rjson" I am still unable to read in the data.

library("rjson")
json_file <- "http://votingrights.news21.com/static/interactives/movement/data/fulldata.js"
lines <- readLines(json_file)
json_data <- fromJSON(lines, collapse="")

1 Answer 1

3

The file you reference is a javascript file containing JSON rather then JSON. In this case you can manually scrub the contents to get the data:

library("rjson")
json_file <- "http://votingrights.news21.com/static/interactives/movement/data/fulldata.js"
lines <- readLines(json_file)
lines[1] <- sub(".* = (.*)", "\\1", lines[1])
lines[length(lines)] <- sub(";", "", lines[length(lines)])
json_data <- fromJSON(paste(lines, collapse="\n"))
> head(json_data[[1]][[1]])
$state
[1] "Alabama"

$bill
[1] "HB 19"

$category
[1] "Strict photo ID"

$introduced
[1] "Mar 1, 2011"

$house
[1] "Yes"

$senate
[1] "Yes"

If you want to interact with the javascript data on the webpage you can use Selenium:

library(RSelenium)
appURL <- "http://votingrights.news21.com/static/interactives/movement/index.html"
pJS <- phantom()
remDr <- remoteDriver(browserName = "phantom")
remDr$open()
remDr$navigate(appURL)
fullData <- remDr$executeScript("return fullData;")
pJS$stop()
> head(fullData[[1]][[1]])
$state
[1] "Alabama"

$bill
[1] "HB 19"

$category
[1] "Strict photo ID"

$introduced
[1] "Mar 1, 2011"

$house
[1] "Yes"

$senate
[1] "Yes"
Sign up to request clarification or add additional context in comments.

4 Comments

Thank you! I had tried this before, but I missed the step where you sub the ";" out, so wasn't able to get it to work. This solution works well. I am wondering, however, if there is a package that will read in this type of scrip without having to manually scrub the contents each time...
You can use Selenium and access the javascript data directly. See various vignettes at cran.r-project.org/web/packages/RSelenium/index.html
I had a similar problem a couple days ago. I went with RSelenium and solved it well. You might find interesting to see stackoverflow.com/questions/27305824/…
Thank you. I will check out RSelenium.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.