0

Hi I have a text file that looks like this:

[1] "Development Name - Woodstock Terrace"                   
[2] "Location - 920 Trinity Avenue, Bronx 10456"             
[3] "Number of Apts. - 319"                                  
[4] "Type of Project - Co-op"                                
[5] "Development Name - York Hill Apartments"                
[6] "Location - 1540 York Avenue, New York 10028"            
[7] "Number of Apts. - 296"                                  
[8] "Type of Project - Co-op"

I want a dataframe with columns for the development name, location, number of apartments, and type of project. Each new row starts with a new development name. In the actual file there are a few hundred rows.

Not sure how to do this. Maybe using " - " as a separator with read_delim? Please help!

1
  • split column on "-", then do a long to wide transformation. Commented May 18, 2021 at 15:25

2 Answers 2

1

Assuming the input shown reproducibly in the Note at the end, we convert it to dcf format by replacing space, minus, space with colon, space and inserting a newline before Development. Then read that in using read.dcf, convert it to data frame and fix the types.

library(magrittr)

input %>%
  sub(" - ", ": ", .) %>%
  sub("^(Development)", "\n\\1", .) %>%
  textConnection %>%
  read.dcf %>%
  as.data.frame %>%
  type.convert(as.is = TRUE)

giving:

      Development Name                         Location Number of Apts. Type of Project
1    Woodstock Terrace  920 Trinity Avenue, Bronx 10456             319           Co-op
2 York Hill Apartments 1540 York Avenue, New York 10028             296           Co-op

Note

input <- c("Development Name - Woodstock Terrace", "Location - 920 Trinity Avenue, Bronx 10456", 
"Number of Apts. - 319", "Type of Project - Co-op", "Development Name - York Hill Apartments", 
"Location - 1540 York Avenue, New York 10028", "Number of Apts. - 296", 
"Type of Project - Co-op")
Sign up to request clarification or add additional context in comments.

Comments

1

Read your text as df with one Column. Lets name the column X1:

df=tibble(X1=c("Development Name - Woodstock Terrace",   
               "Location - 920 Trinity Avenue",          
               "Number of Apts. - 319",                  
               "Type of Project - Co-op",                
               "Development Name - York Hill Apartments",
               "Location - 1540 York Avenue",            
               "Number of Apts. - 296",                  
               "Type of Project - Co-op"))

Create Columns and Values Vectors and read them as a new data frame

ColumnNames=c("Development Name - ","Location - ","Number of Apts. - ","Type of Project - ")
Columns=str_match(df$X1,ColumnNames)%>%str_remove(' - ')
Values=str_remove_all(df$X1,ColumnNames)
df0=tibble(Cols=Columns,Vals=Values)

Pivot Wide the new data frame, See also pivot_wider issue "Values in `values_from` are not uniquely identified; output will contain list-cols"

df1=df0%>%
  group_by(Cols)%>%
  mutate(row = row_number())%>%
  pivot_wider(names_from = Cols,values_from=Vals,id_cols=Columns)%>%
  select(-row)

> df1
# A tibble: 2 x 4
  `Development Name`   Location           `Number of Apts.` `Type of Project`
  <chr>                <chr>              <chr>             <chr>            
1 Woodstock Terrace    920 Trinity Avenue 319               Co-op            
2 York Hill Apartments 1540 York Avenue   296               Co-op   

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.