0

I'm trying to scrape all the providers from this page: https://www.agedcareguide.com.au/nursing-homes/providers/vic

I'm using RSelenium on my Mac by running the following code in Terminal with Docker:

docker run -d -p 4445:4444 selenium/standalone-firefox

Then when I return to RStudio and run the following:

remDr <- remoteDriver(remoteServerAddr = "localhost", port = 4445L, 
browserName = "firefox")
remDr$open()
remDr$navigate("https://www.agedcareguide.com.au/nursing-homes/providers/vic")
remDr$getTitle()

All is good.

Then I try to get the element by using:

provs <- remDr$findElement()

and inside the brackets I have used the XPath, CSS Selector, everything I can think of but it always comes up saying:

Error in match.arg(using) : 'arg' should be one of “xpath”, “css selector”, “id”, “name”, “tag name”, “class name”, “link text”, “partial link text”

Anybody got any ideas where I'm going so terribly wrong?

3
  • 1
    This seems to work without an error provs <- remDr$findElement(using="class",value="c-result-list"). Note that this only finds the element, it does not get it without a bit of further processing. An alternative would be to use page <- remDr$getPageSource() after your navigate line, and then use rvest or similar to extract what you want from page. Commented Mar 27, 2018 at 10:00
  • Looking at the page source through right-clicking, the elements aren't there in text. Not sure how rvest would then be able to find them? And trying the findElement option returns an analysis on the browser. Commented Mar 27, 2018 at 22:11
  • Yes, although the 'selector' code seems to work - but you might need to also build in a delay. See answer below. Good luck! Commented Mar 28, 2018 at 9:21

1 Answer 1

2

A partial solution...

with RSelenium...

remDr$navigate(...)
Sys.sleep(20) #the page keeps loading for some time
page <- remDr$getPageSource()

then, with rvest...

provs <- page[[1]] %>% read_html() %>% 
   html_node("#app > div > div.c-col-results > div:nth-child(3) > div > section") %>% 
   html_text()

after a bit of tidying (split by \\n, remove blanks)...

provs
 [1] "AdventCare"                                     "Providing nursing homes" 
 [3] "Alexandra Gardens SRS"                          "Providing nursing homes" 
 [5] "Allbright Manor"                                "Providing nursing homes"
 [7] "Alliance Care Services Group"                   "Providing nursing homes" 
 etc...

Hopefully this will help get you started, although it is a tricky one!

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks for this - it works to get the data out which is great. Now I've got to master all the tidying which is a bit frustrating!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.