0

I have a vector data (This is a column in a data frame):

 [1] "Tue 12-14 (w1-6, CLB 6)"           "Mon 18-20 (w1-6, ColomboThC)"      "Thu 14-16 (w1-6,7-9,10-12, CLB 8)"
 [4] "Fri 13 (w2-9,10-13, Law 388)"      "Fri 14 (w2-9,10-13, Sqhouse206)"   "Fri 15 (w2-9,10-13, Sqhouse115)"  
 [7] "Thu 17 (w2-9,10-13, Block G16)"    "Thu 18 (w2-9,10-13, Block G16)"    "Mon 10 (w2-9,10-13, AinswthG01)"  
[10] "Mon 11 (w2-9,10-13, Sqhouse203)"   "Mon 12 (w2-9,10-13, Sqhouse206)"   "Mon 13 (w2-9,10-13, BUS 114)"     
[13] "Mon 16 (w2-9,10-13, Gold G03)"     "Mon 17 (w2-9,10-13, Quad G047)"    "Mon 20 (w2-9,10-13, Col LG02)"    
[16] "Tue 17 (w2-9,10-13, Quad 1001)"    "Tue 18 (w2-9,10-13, Quad 1001)"    "Tue 19 (w2-9,10-13, Quad 1001)"   
[19] "Tue 20 (w2-9,10-13)"               "Wed 10 (w2-9,10-13, Quad 1046)"    "Wed 11 (w2-9,10-13, Quad 1046)"   
[22] "Wed 12 (w2-9,10-13, Quad 1046)"    "Wed 13 (w2-9,10-13, Quad G046)"

I would like to extract strings based on the patterns, so the expected output, as an example, for the first element of the vector would be:

"Tue" "12-14"  "1-6" "CLB 6"

Example of output of the third element would be:

 "Thu" "14-16" c("1-6","7-9","10-12") "CLB 8"

where c("1-6","7-9","10-12") is a list.

(Note that each of this will be appended as new columns in my data frame.)

I'm thinking of using gsub to extract each part of the string. Is there other functions I could use?

Any advice is much appreciated :)

2
  • 2
    I think I can find ways to extract this using grep as well, but would like to work with someone who has at least posted their code and tried it once themselves, which is the most efficient way for stackoverflow to work properly :) Commented Aug 7, 2017 at 1:29
  • 1
    ?strsplit is often helpful for this sort of thing too. Specify what splits your segments of text. " (w" or ")" or ", " it looks like. Put each of those into a strsplit together like strsplit(x, "\\s+\\(w|\\)|,\\s+") and you will be half way there. Commented Aug 7, 2017 at 1:48

3 Answers 3

2

We can try functions from tidyverse:

library(tidyverse)
str_split_fixed(vec, pattern = " ", n = 3) %>% 
  as.data.frame() %>%
  mutate(V3 = str_sub(V3,3,-2)) %>%
  separate(V3, c("V3", "V4"), sep = ", ")

The code is written as follows:

  1. Split vector vec by a space into 3 columns and coerce it as a dataframe.
  2. Extract strings from the 3rd column to exclude open and close brackets and "w".
  3. Separate the 3rd column by ", ".

An example of the output:

   V1    V2            V3         V4
1 Tue 12-14           1-6      CLB 6
2 Mon 18-20           1-6 ColomboThC
3 Thu 14-16 1-6,7-9,10-12      CLB 8
Sign up to request clarification or add additional context in comments.

Comments

1

Using the input x in the Note at the end:

  • replace the first space in each string with a semicolon,
  • replace the next space and its following two characters with semicolon,
  • replace the next occurrence of comma space with semicolon
  • remove the last character
  • read that into a data frame splitting fields at semicolons
  • split the 3rd column into a list of character vectors using commas -- omit this step if you prefer to have character string rather than list as 3rd column

It uses only individually simple steps and no packages.

y <- x
y <- sub(" ", ";", y)
y <- sub(" ..", ";", y)
y <- sub(", ", ";", y)
y <- sub(".$", "", y)
DF <- read.table(text = y, sep = ";", as.is = TRUE, fill = NA)
DF[[3]] <- strsplit(DF[[3]], ",")

giving:

> DF
    V1    V2              V3         V4
1  Tue 12-14             1-6      CLB 6
2  Mon 18-20             1-6 ColomboThC
3  Thu 14-16 1-6, 7-9, 10-12      CLB 8
4  Fri    13      2-9, 10-13    Law 388
5  Fri    14      2-9, 10-13 Sqhouse206
6  Fri    15      2-9, 10-13 Sqhouse115
7  Thu    17      2-9, 10-13  Block G16
8  Thu    18      2-9, 10-13  Block G16
9  Mon    10      2-9, 10-13 AinswthG01
10 Mon    11      2-9, 10-13 Sqhouse203
11 Mon    12      2-9, 10-13 Sqhouse206
12 Mon    13      2-9, 10-13    BUS 114
13 Mon    16      2-9, 10-13   Gold G03
14 Mon    17      2-9, 10-13  Quad G047
15 Mon    20      2-9, 10-13   Col LG02
16 Tue    17      2-9, 10-13  Quad 1001
17 Tue    18      2-9, 10-13  Quad 1001
18 Tue    19      2-9, 10-13  Quad 1001
19 Tue    20      2-9, 10-13           
20 Wed    10      2-9, 10-13  Quad 1046
21 Wed    11      2-9, 10-13  Quad 1046

It would be possible to replace the first 4 lines of code with this one line in which case it reduces to only 4 lines of code in total.

y <- Reduce(function(x, pat) sub(pat, ";", x), init = x, c(" ", " ..", ", "))

Note: The input x in reproducible form is:

x <- c("Tue 12-14 (w1-6, CLB 6)", "Mon 18-20 (w1-6, ColomboThC)", 
"Thu 14-16 (w1-6,7-9,10-12, CLB 8)", "Fri 13 (w2-9,10-13, Law 388)", 
"Fri 14 (w2-9,10-13, Sqhouse206)", "Fri 15 (w2-9,10-13, Sqhouse115)", 
"Thu 17 (w2-9,10-13, Block G16)", "Thu 18 (w2-9,10-13, Block G16)", 
"Mon 10 (w2-9,10-13, AinswthG01)", "Mon 11 (w2-9,10-13, Sqhouse203)", 
"Mon 12 (w2-9,10-13, Sqhouse206)", "Mon 13 (w2-9,10-13, BUS 114)", 
"Mon 16 (w2-9,10-13, Gold G03)", "Mon 17 (w2-9,10-13, Quad G047)", 
"Mon 20 (w2-9,10-13, Col LG02)", "Tue 17 (w2-9,10-13, Quad 1001)", 
"Tue 18 (w2-9,10-13, Quad 1001)", "Tue 19 (w2-9,10-13, Quad 1001)", 
"Tue 20 (w2-9,10-13)", "Wed 10 (w2-9,10-13, Quad 1046)", "Wed 11 (w2-9,10-13, Quad 1046)", 
"Wed 12 (w2-9,10-13, Quad 1046)", "Wed 13 (w2-9,10-13, Quad G046)"

Comments

0

Base funciton can do, you don't need to import any other packages. tidyverse is great one but a little bit big for this,

x=c("Thu 18 (w2-9,10-13, Block G16)","Mon 18-20 (w1-6, ColomboThC)")
do.call(rbind,lapply(x,function(i){
  y=strsplit(i,' \\(')[[1]]
  y[2]=gsub('\\)','',y[2])
  out1=strsplit(y[1],' ')[[1]]
  out2=strsplit(y[2],', ')[[1]]
  listpart=grepl('-',out2)
  do.call(cbind,c(out1,list(out2[listpart]),out2[!listpart]))
}))

output will be:

         [,1]  [,2]    [,3]         [,4]        
[1,] "Thu" "18"    "w2-9,10-13" "Block G16" 
[2,] "Mon" "18-20" "w1-6"       "ColomboThC"

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.