4

I want to save a HTML file using a URL from R. I have tried to save the response object(s) after using GET and read_html functions of httr and rvest packages respectively, on the URL of the website, I want to save the HTML of. But that didn't work out to save the actual contents of the website.

url = "https://facebook.com"
get_object = httr::GET(url); save(get_object, "file.html")
html_object = rvest::read_html(url); save(html_object, "file.html")

Neither of these work to save the correct output (i.e, the HTML content of the webpage in a .html file) of the actual website in the HTML file.

2
  • What is the "correct" output and what are you getting? Commented Jun 7, 2016 at 19:54
  • The correct output that I am looking for is the HTML content of the webpage in the file.html file I am getting some junk inside the file.html file. Commented Jun 8, 2016 at 3:49

1 Answer 1

9

Use str(object) to figure out what you are working with. In both cases, you were trying to write non-text to a text file.

Here's how to get the text and write it using both of your libraries...

url = "https://facebook.com"

library(httr)
get_object = GET(url)
cat(content(get_object, "text"), file="temp.html")

library(rvest)
html_object = read_html(url)
write_xml(html_object, file="temp.html")
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.