rselenenium loop/function, retrieve content + error-handling

Ask Question

Asked 4 years, 7 months ago

Modified 4 years, 7 months ago

Viewed 65 times

Part of R Language Collective

Im scraping some newspaper-sites for articles related to the subject "the fourth indsutrial revolution.

It's supposed to open the site, log in, search for "fjerde industrielle revolution" (fourth industrial revolution in Danish), make all searchresults accessible, put all the headlines into a vector, iterate over all headlines in a function, and get the articles behind the behind the headlines.

I can enter the results one-by-one and scrape them but I need it to iterate over them in a function and scrape all the articles.

If you see places in the code that could be improved, please let me know.

Thank you in advance for any feedback. Anders

library(rvest)
library(RSelenium)
library(dplyr)
library(tidyverse)
library(tidytext)

rD <- rsDriver(browser=c("firefox"))

driver <- rD[["client"]]

driver$navigate("https://www.berlingske.dk/")
#Allow cokkies
Sys.sleep(1)
element <- driver$findElement(using = "link text", "Kun nødvendige")
Sys.sleep(1)
element$clickElement()
# Press log ind
Sys.sleep(1)
element <- driver$findElement("link text", "LOG IND") # This guy
#element$highlightElement()
element$clickElement()
Sys.sleep(1)
#Find username-element
element <- driver$findElement(using = "id", "email")
Sys.sleep(1)
#enter username
element$sendKeysToElement(list("*******@hotmail.com"))
Sys.sleep(1)
#Find password element
element <- driver$findElement(using = "id","password")
#Enter password 
element$sendKeysToElement(list("**********", key= "enter"))
Sys.sleep(1)
#Find menu-icon and click it
element <- driver$findElement(using = "css selector", ".lp_nav_menu > ul:nth-child(3) > li:nth-child(5) > a:nth-child(1)")
element$clickElement()
Sys.sleep(1)
#selcet input box
element <- driver$findElement(using = "css", "#site-search")
#element$clickElement()
#send text to input box and search
Sys.sleep(1)
element$sendKeysToElement(list("fjerde industrielle revolution", key="enter"))
Sys.sleep(1)


tryCatch({
  Sys.sleep(1)
  suppressMessages({
    loadmore <- driver$findElement("css selector", "button.btn:nth-child(1)")
    while(loadmore$isElementDisplayed()[[1]]){
      loadmore$clickElement()  
      Sys.sleep(1)
      loadmore <- driver$findElement("css selector", "button.btn:nth-child(1)")
      
    }
  })
}, 
error = function(e) {
  NA_character_
}
)

#Get headlines - works, makes list of headlines
element <- driver$findElements(using = "css selector", "h4:nth-child(2) > a:nth-child(1)")
headers <- unlist(lapply(element, function(x) {x$getElementText()})) %>% unique(element)

#Opens the first link
element <- driver$findElement(using="css selector", "h4:nth-child(2) > a:nth-child(1)")
element$clickElement()
```
This code are used after the searchresult have been entered
```
#Finds and gets headline
artikel1_overskrift <- driver$findElement(using="css", value=".article-header__title")
artikel1_overskrift <- artikel1_overskrift$getElementText()

#Finds and gets the intro
artikel_indledning <- driver$findElement(using="css", value="#articleHeader > p")
artikel_indledning <- artikel_indledning$getElementText()

#Finds and gets element holding date etc.
artikel_dato.m.m. <- driver$findElement(using="css", value=".col-lg-11")
artikel_dato.m.m. <- artikel_dato.m.m.$getElementText()

#Finds and gets body of article
artikel1_body <- driver$findElement(using = "css", value="#articleBody")
artikel1_body <- artikel1_body$getElementText()

#Edit

I want this function to iterate over a list containing the headers and compairing them to the headlines of the searchresults i.e. links to the articles, but r throws an error.

comparison (1) is possible only for atomic and list types

I*ve tried: tibble, data_frame, list and character_vector/atomic vector with no change.

Does somebody have a suggestion to what could possibly be causing the error?

for (i in seq_along(headers)) {
  if (headers[i] == driver$findElement(using="css selector", value=".teaser__title-link")){
      element <- driver$findElement(using="css selector", value=".teaser__title-link")
    element$clickElement
  } else {
    print("No luck!")
  }
}

edited Apr 12, 2021 at 20:18

asked Apr 11, 2021 at 13:46

Anders Jørgensen

4351 gold badge4 silver badges9 bronze badges

Add a comment |

0 Your Answer

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Collectives™ on Stack Overflow

rselenenium loop/function, retrieve content + error-handling

0

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest

Linked