htmlParse fails to load external entity

Question

I'm trying to load some publicly available NHS data using R and the XML package but I keep getting the following error message:

Error: failed to load external entity "http://www.england.nhs.uk/statistics/statistical-work-areas/bed-availability-and-occupancy/"

I can't seem to figure out what might be causing this despite looking through a few related question.

Here is my very simple code:

library("XML")
url <- "http://www.england.nhs.uk/statistics/statistical-work-areas/bed-availability-and-occupancy/"
doc <- htmlParse(url)

Edit: Session Information

R version 3.0.1 (2013-05-16) Platform: i386-w64-mingw32/i386 (32-bit)

locale: [1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United Kingdom.1252 [3] LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C [5] LC_TIME=English_United Kingdom.1252

attached base packages: [1] stats graphics grDevices utils
datasets methods base

loaded via a namespace (and not attached): [1] tools_3.0.1

It's not a valid XML document: W3 Validator. It should at least be XHTML, HTML5 is not. — CodeManX
– CodeManX, Commented May 2, 2014 at 14:32
When I run the code on an Ubuntu box it succeeds, it also runs on r-fiddle. Can you add sessionInfo() please? r-fiddle.org/#/fiddle?id=AfoyOSGm — Steph Locke
– Steph Locke, Commented May 2, 2014 at 14:36
sessionInfo() added! I suspect I have the answer already though. This is almost certainly being caused by my work's proxy. I've hit issues with this before (via QGIS) and have never found a satisfactory solution. — Tumbledown
– Tumbledown, Commented May 6, 2014 at 12:13
@Tumbledown, I had the same problem. However after I rebooted my R session it worked again .... weird. — Jacob H
– Jacob H, Commented Jan 23, 2016 at 0:07

luidam · Accepted Answer · 2015-03-05 15:19:45Z

12

Package XML has some issues. The problem is intermitent and has nothing to do with the URL. I solved the problem using the function GET of httr package in order to obtain the html code, then passed it to htmlParse, see below:

library("XML")
library(httr)
url <- "http://www.england.nhs.uk/statistics/statistical-work-areas/bed-availability-and-occupancy/"
doc <- htmlParse(rawToChar(GET(url)$content))

edited Mar 5, 2015 at 15:19

answered Mar 5, 2015 at 15:14

luidam

1211 silver badge5 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

hrbrmstr · Accepted Answer · 2015-07-25 02:00:31Z

5

You can also use rvest & the xml2 packages:

library(rvest) # github version
library(xml2)  # github version

url <- "http://www.england.nhs.uk/statistics/statistical-work-areas/bed-availability-and-occupancy/"
doc <- read_html(url)

doc %>% 
  html_nodes("a[href^='http://www.england.nhs.uk/statistics/bed-availability-and-occupancy/']") %>% 
  html_attr("href")

## [1] "http://www.england.nhs.uk/statistics/bed-availability-and-occupancy/bed-data-overnight/"
## [2] "http://www.england.nhs.uk/statistics/bed-availability-and-occupancy/bed-data-day-only/"

answered Jul 25, 2015 at 2:00

hrbrmstr

79.1k11 gold badges146 silver badges209 bronze badges

1 Comment

mccurcio Over a year ago

This 2nd set of commands returns a set of data where the previous one returned a value that could not be searched as easily.

Collectives™ on Stack Overflow

htmlParse fails to load external entity

2 Answers 2

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related