9

I'm trying to load some publicly available NHS data using R and the XML package but I keep getting the following error message:

Error: failed to load external entity "http://www.england.nhs.uk/statistics/statistical-work-areas/bed-availability-and-occupancy/"

I can't seem to figure out what might be causing this despite looking through a few related question.

Here is my very simple code:

library("XML")
url <- "http://www.england.nhs.uk/statistics/statistical-work-areas/bed-availability-and-occupancy/"
doc <- htmlParse(url)

Edit: Session Information

R version 3.0.1 (2013-05-16) Platform: i386-w64-mingw32/i386 (32-bit)

locale: [1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United Kingdom.1252 [3] LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C [5] LC_TIME=English_United Kingdom.1252

attached base packages: [1] stats graphics grDevices utils
datasets methods base

loaded via a namespace (and not attached): [1] tools_3.0.1

4
  • It's not a valid XML document: W3 Validator. It should at least be XHTML, HTML5 is not. Commented May 2, 2014 at 14:32
  • When I run the code on an Ubuntu box it succeeds, it also runs on r-fiddle. Can you add sessionInfo() please? r-fiddle.org/#/fiddle?id=AfoyOSGm Commented May 2, 2014 at 14:36
  • sessionInfo() added! I suspect I have the answer already though. This is almost certainly being caused by my work's proxy. I've hit issues with this before (via QGIS) and have never found a satisfactory solution. Commented May 6, 2014 at 12:13
  • @Tumbledown, I had the same problem. However after I rebooted my R session it worked again .... weird. Commented Jan 23, 2016 at 0:07

2 Answers 2

12

Package XML has some issues. The problem is intermitent and has nothing to do with the URL. I solved the problem using the function GET of httr package in order to obtain the html code, then passed it to htmlParse, see below:

library("XML")
library(httr)
url <- "http://www.england.nhs.uk/statistics/statistical-work-areas/bed-availability-and-occupancy/"
doc <- htmlParse(rawToChar(GET(url)$content))
Sign up to request clarification or add additional context in comments.

Comments

5

You can also use rvest & the xml2 packages:

library(rvest) # github version
library(xml2)  # github version

url <- "http://www.england.nhs.uk/statistics/statistical-work-areas/bed-availability-and-occupancy/"
doc <- read_html(url)

doc %>% 
  html_nodes("a[href^='http://www.england.nhs.uk/statistics/bed-availability-and-occupancy/']") %>% 
  html_attr("href")

## [1] "http://www.england.nhs.uk/statistics/bed-availability-and-occupancy/bed-data-overnight/"
## [2] "http://www.england.nhs.uk/statistics/bed-availability-and-occupancy/bed-data-day-only/" 

1 Comment

This 2nd set of commands returns a set of data where the previous one returned a value that could not be searched as easily.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.