I am trying to create a dataframe of color IDs, description, and dates from this site, which takes day and month input through dropdown menus and returns, I think, a dynamic JS generated page. I'm new to coding and thought this would be a fun toy project. I'd like to use RSelenium to automate the dropdown selection, and rvest to scrape the generated content. The data frame structure I'm hoping for will look like:
description, date, meta
"paragraph about birthday", Jun 01, "DAFFODIL PANTONE 17-1512 POWERFUL KNOWING EXPRESSIVE"
I'm attempting to first use a for loop to just iterate through each month of the year on a single day then work my way up to get every day for every month.
I'm stuck on simply getting the loop to iterate through each month, and getting the content. I could use some conceptual help first on this part of the task and appreciate any insight!
library(RSelenium)
library(rvest)
library(tidyverse)
library(xml2)
## first run: docker run -d -p 4445:4444 selenium/standalone-chrome
## open a new connection to Chrome
remDr <- RSelenium::remoteDriver(remoteServerAddr = "localhost",
port = 4445L,
browserName = "chrome")
remDr$open()
remDr$navigate("https://www.pantone.com/pages/iphone/iphone_colorstrology.html#___1__") #Entering our URL gets the browser to navigate to the page
remDr$screenshot(display = TRUE)
#### create list of month/days
month_day<- read_html(remDr$getPageSource()[[1]])
page_i <- month_day %>%
html_nodes(".list") %>%
html_children() %>%
html_text()
months <- page_i[1:12]
months <- (paste("'", months,"'", sep=''))
days <- page_i[13:43]
days <- as.numeric(days)
## create an object for month xpath elements
for (m in months){
elements <- paste0("//option[contains(text(),",months,")]")
}
## attempt at loop
total <- data.frame()
for (e in elements){
remDr$navigate("https://www.pantone.com/pages/iphone/iphone_colorstrology.html#___1__")
print(e)
month <- remDr$findElement(using = 'xpath', e)
month$clickElement()
day <- remDr$findElement(using = 'xpath', "//select[@id='lstDay']//option[5]") ## arbitrarily picking the 5th of each month
day$clickElement()
submit <- remDr$findElement(using = 'xpath', "/html[1]/body[1]/form[1]/div[1]/a[1]")
submit$clickElement()
html <- read_html(remDr$getPageSource()[[1]])
description <- html %>% html_nodes(xpath = "//tr//tr[2]//td[1]") %>% html_text() %>% gsub("^\\s+|\\s+$", "", .)
meta <- html %>% html_nodes(xpath = "//td[@id='tdBg']") %>% html_text() %>% gsub("^\\s+|\\s+$", "", .)
date <- html %>% html_nodes(xpath = "//td[@id='bgHeaderDate']//div") %>% html_text() %>% gsub("^\\s+|\\s+$", "", .)
df <- data.frame(cbind(description,meta,date))
total <- rbind(total, df)
}
Not getting any errors but the results are unexpected each time. Either it would repeat on a single month/day combination like Jan05 * 12 times or jan05 * 3 times, Apr 05 *3 times, etc.

