Consider a sample dataset:
dt <- data.table(data.frame(V1 = c("C1/R3","M2/R4")))
> dt
V1
1: C1/R3
2: M2/R4
For each row of dt, I want extract the concatenated characters C,M, or R. For example,
dt[,V2 := stri_join_list(str_match_all(V1,"[CMR],sep="",collapse=""),by=seq_len(nrow(dt))]
> dt
V1 V2
1: C1/R3 CR
2: M2/R4 MR
However, I have 42 million rows and the above code is not nearly efficient enough. Is there a way to do this without using row-wise operations? When I skip the by argument I get entry CRMRfor each row.
by=? -dt[,V2 := stri_join_list(str_match_all(V1,"[CMR]"))]- I'm not sure how you are ending up withNAvalues, but you might want to include a row that does so in your example.CRfor the first row andMRfor the second row as per your originaldtbefore the update. You need to remove thecollapse=""which is in your code (and not in mine).