I am scraping the Newark Liberty International Airport's website to keep track of their daily schedules. Here is the piece of code I have developed:
library(rvest)
url <- read_html('https://www.airport-ewr.com/newark-departures-terminal-C?
tp=6&day=tomorrow')
population <- url %>% html_nodes(xpath = '//*[@id="flight_detail"]') %>%
html_text() %>% gsub(pattern = '\\t|\\r|\\n', replacement = ' ') %>%
trimws() %>% gsub(pattern = '\\s+', replacement = " ")
gsub() is for removing the leading and trailing whitespaces and extra spaces within the text. The code works well and I have attached the snippet of the output:
I want to convert this character string into a dataframe which would contain values as shown below:
Any help is appreciated !!


dput(head(x)). This is absolutely a regular-expression problem, which means it will take a lot of work to make it robust. Is there another format in which you can retrieve that data?trimws. What is the algorithm for splitting this string? What have you tried?