1

I have a series of functions which goes to a website and collects data. Sometimes the website returns a 404 error and my code breaks. It could take 10 minutes of processing until I get a 404 error, or the code (more often then not) runs without the 404 error.

I have the following code:

linkToStopAt = as.character(unique(currentData$linkURL)[1])
myLinksToSearchOver = as.character(unique(currentData$page))
tmp = NULL
i <- 1
out_lst = list()
while(i <= length(myLinksToSearchOver)){
  print(paste("Processing page: ", i))
  tmp <- possible_collectPageData(myLinksToSearchOver[i]) %>% 
    add_column(page = myLinksToSearchOver[i])
  if(linkToStopAt %in% tmp$linkURL)
  {
    print(paste("We stopped at: ", i))
    break
  }
  out_lst[[i]] <- tmp
  i <- i + 1
}

Broken down as:

linkToStopAt = as.character(unique(currentData$linkURL)[1]) gives me a single URL where the while loops will break if it see this URL

myLinksToSearchOver = as.character(unique(currentData$page)) gives me multiple links in which the while loop will search over, once it finds the linkToStopAt on one of these links, the while loop breaks.

tmp <- possible_collectPageData(myLinksToSearchOver[i]) %>% add_column(page = myLinksToSearchOver[i]) This is a big function, which relies on many other functions...

######################################################

So, the while loop runs until it finds a link linkToStopAt on one of the pages from myLinksToSearchOver. The function possible_collectPageData just does all my scraping/data processing etc. Each page from myLinksToSearchOver is stored in out_lst[[i]] <- tmp.

I recieve a specific error "Error in if (nrow(df) != nrow(.data)) { : argumento tiene longitud cero" in the console sometimes.

What I want to do, is something like:

repeat {
  tmpCollectData <- try(while("ALL-MY-WHILE-LOOP-HERE??")) #try(execute(f))
  if (!(inherits(tmpCollectData, "Error in if (nrow(df) != nrow(.data)) { : argumento tiene longitud cero"))) 
    break
}

Where, if the while loop breaks with that error, just run it all again, setting tmp = NULL, i = 1, out_list = list() etc. (Basically start again, I can do this manually by just re-executing the code again)

1 Answer 1

1

You could create a function that does your work, and then wrap the call to that function in try(), with silent=TRUE. Then place that in a while(TRUE) loop, breaking out if get_data() does NOT return an error:

  1. Function to do your work
get_data <- function(links, stoplink) {
  
  i=1
  out_lst=list()
  while(i <= length(links)){
    print(paste("Processing page: ", i))
    
    tmp = possible_collectPageData(links[i]) %>% add_column(page = links[i])
    
    if(stoplink %in% tmp$linkURL) {
      print(paste("We stopped at: ", i))
      break
    }
    
    out_lst[[i]] <- tmp
    i <- i + 1
  }
  return(out_lst)
}
  1. Infinite loop that gets broken if result does not have any error.
while(TRUE) {
  result = try(get_data(myLinksToSearchOver, linkToStopAt), silent=T)
  if(!"try-error" %in% class(result)) break
}
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks for your answer. I am not looking to return a NULL value, if the error occurs, just forget everything and start the while loop again. The error is just a 404 error because something strange occured and the Firefox browser closed.
Okay, perhaps my edit helps?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.