Convert character string list into dataframe in R

Question

I am scraping the Newark Liberty International Airport's website to keep track of their daily schedules. Here is the piece of code I have developed:

library(rvest)

url <- read_html('https://www.airport-ewr.com/newark-departures-terminal-C?
tp=6&day=tomorrow')

population <- url %>% html_nodes(xpath = '//*[@id="flight_detail"]') %>% 
              html_text() %>% gsub(pattern = '\\t|\\r|\\n', replacement = ' ') %>% 
              trimws() %>% gsub(pattern = '\\s+', replacement = " ")

gsub() is for removing the leading and trailing whitespaces and extra spaces within the text. The code works well and I have attached the snippet of the output:

I want to convert this character string into a dataframe which would contain values as shown below:

Any help is appreciated !!

Can you please share data as text? Image will not help people to work on your problem. — MKR
– MKR, Commented Apr 4, 2018 at 20:53
Please do not show images of data, just give the data itself (preferably with an easy-to-copy format like dput(head(x)). This is absolutely a regular-expression problem, which means it will take a lot of work to make it robust. Is there another format in which you can retrieve that data? — r2evans
– r2evans, Commented Apr 4, 2018 at 20:55
You could use trimws. What is the algorithm for splitting this string? What have you tried? — Roman Luštrik
– Roman Luštrik, Commented Apr 4, 2018 at 21:33
Please post the snippet as plaintext, to make your example reproducible, so people can copy-and-paste it. That's the startpoint for this question. — smci
– smci, Commented Apr 4, 2018 at 22:43

Mako212 · Accepted Answer · 2018-04-04 22:31:26Z

Try this out:

library(rvest)

url <- read_html('https://www.airport-ewr.com/newark-departures-terminal-C?tp=6&day=tomorrow')


population <- url %>% html_nodes(xpath = '//*[@id="flight_detail"]') %>% 
              html_text()

First we read in the raw text rows. Then I noticed that each column is separated by \n but sometimes there's more than one, so first we gsub out the extra \n delimiters, then string split by \n, and rbind the output into a data.frame

popDF <- as.data.frame(
  do.call('rbind',strsplit(gsub("(\\n)+", "\\\n",population),split="\n", fixed=TRUE))
)


  V1               V2                V3      V4       V5                V6 V7      V8                       V9
1      Austin  (AUS)   United Airlines  UA 2427 06:00 am Depart:  06:00 am  C Term. C  Scheduled - On-time [+]
2      Austin  (AUS)               SAS  SK 6868 06:00 am Depart:  06:00 am  C Term. C  Scheduled - On-time [+]
3      Boston  (BOS)   United Airlines  UA 1699 06:00 am Depart:  06:00 am  C Term. C  Scheduled - On-time [+]
4    Columbus  (CMH)          CommutAir C5 4973 06:00 am Depart:  06:00 am  C Term. C  Scheduled - On-time [+]
5    Columbus  (CMH)   United Airlines  UA 4973 06:00 am Depart:  06:00 am  C Term. C  Scheduled - On-time [+]
6     Detroit  (DTW)  Republic Airlines YX 3482 06:00 am Depart:  06:00 am  C Term. C  Scheduled - On-time [+]

Collectives™ on Stack Overflow

Convert character string list into dataframe in R

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related