3
DT <- data.table(num=c("20031111","1112003","23423","2222004"),y=c("2003","2003","2003","2004"))

> DT
    num    y
1: 20031111 2003
2:  1112003 2003
3:    23423 2003
4:  2222004 2004

I want to compare the two cell content, and perform an action based on the boolean value. for instance, if "num" matches the year, create a column x holding that value. I thought about subsetting based on grep, and that works, but naturally checks the whole column every time which seems wasteful

DT[grep(y,num)] # works with a pattern>1 warning

I could apply() my way but perhaps there's a data.table way?

Thanks

1
  • @Frank: I didn't; I changed the question to match the question, namely the redirected question on grepl and stringi which we both just contributed on. If you're not going to enlarge this title to make it sufficiently broad as a primary source, please revert that redirect. R has many many more string-matching functions than just grep. Commented Oct 28, 2015 at 21:32

2 Answers 2

5

If you're happy using the stringi package, this is a way that takes advantage of the fact that the stringi functions vectorise both pattern and string:

DT[stri_detect_fixed(num, y), x := num])

Depending on the data, it may be faster than the method posted by Veerenda Gadekar.

DT <- data.table(num=paste0(sample(1000), sample(2001:2010, 1000, TRUE)),
                 y=as.character(sample(2001:2010, 1000, TRUE)))
microbenchmark(
    vg = DT[, x := grep(y, num, value=TRUE, fixed=TRUE), by = .(num, y)],
    nk = DT[stri_detect_fixed(num, y), x := num]
)

#Unit: microseconds
# expr      min       lq     mean   median       uq      max neval
#   vg 6027.674 6176.397 6513.860 6278.689 6370.789 9590.398   100
#   nk  975.260 1007.591 1116.594 1047.334 1110.734 3833.051   100
Sign up to request clarification or add additional context in comments.

1 Comment

@DavidArenburg changed to stri_detect_fixed
3

You could do this

DT[, x := grep(y, num, value = TRUE, fixed = TRUE), by = .(num, y)]

#> DT
#        num    y        x
#1: 20031111 2003 20031111
#2:  1112003 2003  1112003
#3:    23423 2003       NA
#4:  2222004 2004  2222004

1 Comment

Great! thanks... haven't thought of "value" option.Also though subsetting would be better for some reason

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.