This code scrapes from here http://www.bls.gov/schedule/news_release/2015_sched.htm every Date that contains Employment Situation under the Release column.
pg <- read_html("http://www.bls.gov/schedule/news_release/2015_sched.htm")
# target only the <td> elements under the bodytext div
body <- html_nodes(pg, "div#bodytext")
# we use this new set of nodes and a relative XPath to get the initial <td> elements, then get their siblings
es_nodes <- html_nodes(body, xpath=".//td[contains(., 'Employment Situation for')]/../td[1]")
# clean up the cruft and make our dates!
nfpdates2015 <- as.Date(trimws(html_text(es_nodes)), format="%A, %B %d, %Y")
###thanks @hrbrmstr for this###
I would like to repeat that for other URLs, containing other years, named in the same way with only the year number changing. Particularly, for the following URLs:
#From 2008 to 2015
http://www.bls.gov/schedule/news_release/2015_sched.htm
http://www.bls.gov/schedule/news_release/2014_sched.htm
...
http://www.bls.gov/schedule/news_release/2008_sched.htm
My knowledge of rvest, HTML and XML is almost non-existent. I thought to apply the same code with a for loop, but my efforts were futile. Of course I could just repeat the code for 2015 eight times to get all years, it would neither take too long nor too much space. Yet I am very curious to know how this could be done in a more efficient way. Thank you.