2

I have a data.table similar to the one as follows

Data

library(data.table)
DT <- structure(list(N = 1:6, VN = c("v1", "v3", "v6", "v7a", "v18", 
"v23"), T1 = c("bigby (wolf)", "white", "red (rose)", "piggy (straw)", 
"(curse) beast", "prince"), T2 = c("jack (bean)", "snow (dwarves)", 
"beard (blue)", "bhageera (jungle) mowgli (book)", "beauty", 
"glass (slipper)"), T3 = c("hk (34)", "VL (r45)", "tg (h5)", 
"tt (HG) (45)", "gh", "vlp"), Val = c(36, 25, 0.84, 12, 78, 258
)), .Names = c("N", "VN", "T1", "T2", "T3", "Val"), class = "data.frame", row.names = c(NA, 
-6L))

setDT(DT)

DT
   N  VN            T1                              T2           T3    Val
1: 1  v1  bigby (wolf)                     jack (bean)      hk (34)  36.00
2: 2  v3         white                  snow (dwarves)     VL (r45)  25.00
3: 3  v6    red (rose)                    beard (blue)      tg (h5)   0.84
4: 4 v7a piggy (straw) bhageera (jungle) mowgli (book) tt (HG) (45)  12.00
5: 5 v18 (curse) beast                          beauty           gh  78.00
6: 6 v23        prince                 glass (slipper)          vlp 258.00

I want to extract all the strings within parentheses from columns T1 and T2 to a new column C.

I can do it to single rows as follows.

Rowwise calculations

setDF(DT)
dtf <- c("T1", "T2")
paste(unique(unlist(regmatches(DT[4,dtf], gregexpr("(?=\\().*?(?<=\\))", DT[4,dtf], perl=T)))), collapse=" ")
[1] "(straw) (jungle) (book)"
paste(unique(unlist(regmatches(DT[3,dtf], gregexpr("(?=\\().*?(?<=\\))", DT[3,dtf], perl=T)))), collapse=" ")
[1] "(rose) (blue)"

I am not able to get similar results using data.table.

Try with data.table

setDT(DT)
DT[, C := paste(unique(unlist(regmatches(get(dtf), gregexpr("(?=\\().*?(?<=\\))", get(dtf), perl=T)))), collapse=" ")]

How to use data.table to get the desired result?

Desired result

out <- structure(list(N = 1:6, VN = c("v1", "v3", "v6", "v7a", "v18", 
"v23"), T1 = c("bigby (wolf)", "white", "red (rose)", "piggy (straw)", 
"(curse) beast", "prince"), T2 = c("jack (bean)", "snow (dwarves)", 
"beard (blue)", "bhageera (jungle) mowgli (book)", "beauty", 
"glass (slipper)"), T3 = c("hk (34)", "VL (r45)", "tg (h5)", 
"tt (HG) (45)", "gh", "vlp"), Val = c(36, 25, 0.84, 12, 78, 258
), C = c("(wolf) (bean)", "(dwarves)", "(rose) (blue)", "(straw) (jungle) (book)", 
"(curse)", "(slipper)")), .Names = c("N", "VN", "T1", "T2", "T3", 
"Val", "C"), class = "data.frame", row.names = c(NA, -6L))
out
  N  VN            T1                              T2           T3    Val                       C
1 1  v1  bigby (wolf)                     jack (bean)      hk (34)  36.00           (wolf) (bean)
2 2  v3         white                  snow (dwarves)     VL (r45)  25.00               (dwarves)
3 3  v6    red (rose)                    beard (blue)      tg (h5)   0.84           (rose) (blue)
4 4 v7a piggy (straw) bhageera (jungle) mowgli (book) tt (HG) (45)  12.00 (straw) (jungle) (book)
5 5 v18 (curse) beast                          beauty           gh  78.00                 (curse)
6 6 v23        prince                 glass (slipper)          vlp 258.00               (slipper)

1 Answer 1

3

You can use by and .SDcols to do this.

setDT(DT)
dtf <- c("T1", "T2")
DT[, C := paste(unique(unlist(regmatches(.SD, gregexpr("(?=\\().*?(?<=\\))", .SD, perl=T)))), 
                collapse=" "), 
   by = N, 
   .SDcols = dtf]
DT
## N  VN            T1                              T2           T3    Val                       C
## 1: 1  v1  bigby (wolf)                     jack (bean)      hk (34)  36.00           (wolf) (bean)
## 2: 2  v3         white                  snow (dwarves)     VL (r45)  25.00               (dwarves)
## 3: 3  v6    red (rose)                    beard (blue)      tg (h5)   0.84           (rose) (blue)
## 4: 4 v7a piggy (straw) bhageera (jungle) mowgli (book) tt (HG) (45)  12.00 (straw) (jungle) (book)
## 5: 5 v18 (curse) beast                          beauty           gh  78.00                 (curse)
## 6: 6 v23        prince                 glass (slipper)          vlp 258.00               (slipper)
Sign up to request clarification or add additional context in comments.

1 Comment

If there's a large number of rows with no parentheses in T1 nor T2, you may want to subset on those rows first, along the lines of: DT[grepl("(",T1)|grepl("(",T2),C:=...]

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.