Data table multiple condition with string vector

Question

I would like to subset a data table with a condition that I want to apply on all column in vector of strings with a & condition in between. Example:

library(data.table)    
test <- setDT(as.data.frame(list(ID = c(rep(1,10),rep(2,10)), time = rep(c(1:10),2),
                                 Input = rep(c(array(data = 0, dim = 5),1,array(data = 0, dim = 4)),2), 
                                 replicate(4,sample(c(1:20), 10, replace = TRUE)))))

signalcolumns <- colnames(test)[! colnames(test) %in% c("ID","Input","time")]

Now I want

test[X1 > 5 & X2 > 5 & X3 > 5 & X4 > 5]

and I would like to write it with the signalcolumns.

test[get(signalcolumns) > 5]

doesn't work as it set the condition only on the first X1 column. I don't see what syntax I could use here. I though of trying to evaluate an expression like

c(paste0(signalcolumns[1:(length(signalcolumns)-1)],">5 &"),
paste0(signalcolumns[(length(signalcolumns)-1)],">5") )

but I am a bit stuck here.

Not exactly full data.table but you can do test[rowSums(test[, .SD > 5, .SDcols = signalcolumns]) == length(signalcolumns),] — Sotos
– Sotos, Commented Dec 5, 2017 at 11:18

akrun · Accepted Answer · 2017-12-05 11:59:40Z

4

After specifying the .SDcols as 'signalcolumns', loop through the Subset of data.table, check whether it is greater than 5, and then Reduce to a single vector of TRUE/FALSE for each row to subset the rows

test[test[, Reduce(`&`, lapply(.SD, `>`, 5)), .SDcols = signalcolumns]]

answered Dec 5, 2017 at 11:59

akrun

891k38 gold badges590 silver badges700 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

denis Over a year ago

Perfect; Exactly what I was looking for. I wouldn't have though of Reduce

minem · Accepted Answer · 2017-12-05 12:57:07Z

0

I would do something like this:

testVars <- function(x, y){
  X <- test[, x, with = F]
  X <- X > y
  X <- rowSums(X)
  X == length(x)
}

test[testVars(signalcolumns, 5)]
#    ID time Input X1 X2 X3 X4
# 1:  1    4     0 14  9 15  6
# 2:  1    5     0 14 12 20 16
# 3:  1    6     1 17  8 19 18
# 4:  1   10     0  6 17  8 14
# 5:  2    4     0 14  9 15  6
# 6:  2    5     0 14 12 20 16
# 7:  2    6     1 17  8 19 18
# 8:  2   10     0  6 17  8 14

edited Dec 5, 2017 at 12:57

answered Dec 5, 2017 at 11:19

minem

3,6502 gold badges19 silver badges31 bronze badges

3 Comments

Roland Over a year ago

rowSums is inefficient because it needs to copy and coerce to matrix.

minem Over a year ago

@denis it worked, the colnames was shifted incorrectly, updated the answer

denis Over a year ago

thanks. I like the logic from your answer, similar to Sotos comment.

Eric Watt · Accepted Answer · 2017-12-05 19:51:16Z

0

test[apply(test[, signalcolumns, with = FALSE] > 5, 1, all)]
#    ID time Input X1 X2 X3 X4
# 1:  1    4     0 18 14 11 17
# 2:  1    8     0 15 20 15 14
# 3:  2    4     0 18 14 11 17
# 4:  2    8     0 15 20 15 14

Update

Here is a walk-through of the steps followed.

test
#     ID time Input X1 X2 X3 X4
#  1:  1    1     0 11  5 12  3
#  2:  1    2     0 15  4 17 10
#  3:  1    3     0  3 16 10 19
#  4:  1    4     0 18 14 11 17
#  5:  1    5     0 10 18  7  3
#  6:  1    6     1  2 16  3  6
#  7:  1    7     0  2  4  5  5
#  8:  1    8     0 15 20 15 14
#  9:  1    9     0 16 20 11  5
# 10:  1   10     0 14  5  6 11
# 11:  2    1     0 11  5 12  3
# 12:  2    2     0 15  4 17 10
# 13:  2    3     0  3 16 10 19
# 14:  2    4     0 18 14 11 17
# 15:  2    5     0 10 18  7  3
# 16:  2    6     1  2 16  3  6
# 17:  2    7     0  2  4  5  5
# 18:  2    8     0 15 20 15 14
# 19:  2    9     0 16 20 11  5
# 20:  2   10     0 14  5  6 11

Now generate a table of TRUE/FALSE values based upon being > 5

test_truth <- test[, signalcolumns, with = FALSE] > 5
test_truth
#          X1    X2    X3    X4
#  [1,]  TRUE FALSE  TRUE FALSE
#  [2,]  TRUE FALSE  TRUE  TRUE
#  [3,] FALSE  TRUE  TRUE  TRUE
#  [4,]  TRUE  TRUE  TRUE  TRUE
#  [5,]  TRUE  TRUE  TRUE FALSE
#  [6,] FALSE  TRUE FALSE  TRUE
#  [7,] FALSE FALSE FALSE FALSE
#  [8,]  TRUE  TRUE  TRUE  TRUE
#  [9,]  TRUE  TRUE  TRUE FALSE
# [10,]  TRUE FALSE  TRUE  TRUE
# [11,]  TRUE FALSE  TRUE FALSE
# [12,]  TRUE FALSE  TRUE  TRUE
# [13,] FALSE  TRUE  TRUE  TRUE
# [14,]  TRUE  TRUE  TRUE  TRUE
# [15,]  TRUE  TRUE  TRUE FALSE
# [16,] FALSE  TRUE FALSE  TRUE
# [17,] FALSE FALSE FALSE FALSE
# [18,]  TRUE  TRUE  TRUE  TRUE
# [19,]  TRUE  TRUE  TRUE FALSE
# [20,]  TRUE FALSE  TRUE  TRUE

Then, use apply over each row. The function to apply is all, which will return TRUE if all values applied to it are TRUE, and FALSE if any of the values are not true. Therefore, it will return TRUE for all rows where all of the values are TRUE.

truth_vect <- apply(test_truth, 1, all)
truth_vect
# [1] FALSE FALSE FALSE  TRUE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE  TRUE FALSE FALSE

This is the vector we need to use to subset the table.

test[truth_vect]
#    ID time Input X1 X2 X3 X4
# 1:  1    4     0 18 14 11 17
# 2:  1    8     0 15 20 15 14
# 3:  2    4     0 18 14 11 17
# 4:  2    8     0 15 20 15 14

edited Dec 5, 2017 at 19:51

answered Dec 5, 2017 at 16:44

Eric Watt

3,25012 silver badges21 bronze badges

3 Comments

denis Over a year ago

I am not sure to understand it properly, but it is quite a compact syntax. Thanks you

Eric Watt Over a year ago

@denis good point, I did not explain it at all. I've updated the answer to walk through the steps to hopefully make it more clear.

denis Over a year ago

Thanks for the effort. I like it a lot, quite similar to akrun solution

Collectives™ on Stack Overflow

Data table multiple condition with string vector

3 Answers 3

1 Comment

3 Comments

Update

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

3 Comments

Update

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related