I have a series of functions which goes to a website and collects data. Sometimes the website returns a 404 error and my code breaks. It could take 10 minutes of processing until I get a 404 error, or the code (more often then not) runs without the 404 error.
I have the following code:
linkToStopAt = as.character(unique(currentData$linkURL)[1])
myLinksToSearchOver = as.character(unique(currentData$page))
tmp = NULL
i <- 1
out_lst = list()
while(i <= length(myLinksToSearchOver)){
print(paste("Processing page: ", i))
tmp <- possible_collectPageData(myLinksToSearchOver[i]) %>%
add_column(page = myLinksToSearchOver[i])
if(linkToStopAt %in% tmp$linkURL)
{
print(paste("We stopped at: ", i))
break
}
out_lst[[i]] <- tmp
i <- i + 1
}
Broken down as:
linkToStopAt = as.character(unique(currentData$linkURL)[1]) gives me a single URL where the while loops will break if it see this URL
myLinksToSearchOver = as.character(unique(currentData$page)) gives me multiple links in which the while loop will search over, once it finds the linkToStopAt on one of these links, the while loop breaks.
tmp <- possible_collectPageData(myLinksToSearchOver[i]) %>% add_column(page = myLinksToSearchOver[i]) This is a big function, which relies on many other functions...
######################################################
So, the while loop runs until it finds a link linkToStopAt on one of the pages from myLinksToSearchOver. The function possible_collectPageData just does all my scraping/data processing etc. Each page from myLinksToSearchOver is stored in out_lst[[i]] <- tmp.
I recieve a specific error "Error in if (nrow(df) != nrow(.data)) { : argumento tiene longitud cero" in the console sometimes.
What I want to do, is something like:
repeat {
tmpCollectData <- try(while("ALL-MY-WHILE-LOOP-HERE??")) #try(execute(f))
if (!(inherits(tmpCollectData, "Error in if (nrow(df) != nrow(.data)) { : argumento tiene longitud cero")))
break
}
Where, if the while loop breaks with that error, just run it all again, setting tmp = NULL, i = 1, out_list = list() etc. (Basically start again, I can do this manually by just re-executing the code again)