1

I would like to subset a data table with a condition that I want to apply on all column in vector of strings with a & condition in between. Example:

library(data.table)    
test <- setDT(as.data.frame(list(ID = c(rep(1,10),rep(2,10)), time = rep(c(1:10),2),
                                 Input = rep(c(array(data = 0, dim = 5),1,array(data = 0, dim = 4)),2), 
                                 replicate(4,sample(c(1:20), 10, replace = TRUE)))))

signalcolumns <- colnames(test)[! colnames(test) %in% c("ID","Input","time")]

Now I want

test[X1 > 5 & X2 > 5 & X3 > 5 & X4 > 5]

and I would like to write it with the signalcolumns.

test[get(signalcolumns) > 5]

doesn't work as it set the condition only on the first X1 column. I don't see what syntax I could use here. I though of trying to evaluate an expression like

c(paste0(signalcolumns[1:(length(signalcolumns)-1)],">5 &"),
paste0(signalcolumns[(length(signalcolumns)-1)],">5") )

but I am a bit stuck here.

2
  • 1
    Not exactly full data.table but you can do test[rowSums(test[, .SD > 5, .SDcols = signalcolumns]) == length(signalcolumns),] Commented Dec 5, 2017 at 11:18
  • Nice one, thanks for sharing Commented Dec 5, 2017 at 13:05

3 Answers 3

4

After specifying the .SDcols as 'signalcolumns', loop through the Subset of data.table, check whether it is greater than 5, and then Reduce to a single vector of TRUE/FALSE for each row to subset the rows

test[test[, Reduce(`&`, lapply(.SD, `>`, 5)), .SDcols = signalcolumns]]
Sign up to request clarification or add additional context in comments.

1 Comment

Perfect; Exactly what I was looking for. I wouldn't have though of Reduce
0

I would do something like this:

testVars <- function(x, y){
  X <- test[, x, with = F]
  X <- X > y
  X <- rowSums(X)
  X == length(x)
}

test[testVars(signalcolumns, 5)]
#    ID time Input X1 X2 X3 X4
# 1:  1    4     0 14  9 15  6
# 2:  1    5     0 14 12 20 16
# 3:  1    6     1 17  8 19 18
# 4:  1   10     0  6 17  8 14
# 5:  2    4     0 14  9 15  6
# 6:  2    5     0 14 12 20 16
# 7:  2    6     1 17  8 19 18
# 8:  2   10     0  6 17  8 14

3 Comments

rowSums is inefficient because it needs to copy and coerce to matrix.
@denis it worked, the colnames was shifted incorrectly, updated the answer
thanks. I like the logic from your answer, similar to Sotos comment.
0
test[apply(test[, signalcolumns, with = FALSE] > 5, 1, all)]
#    ID time Input X1 X2 X3 X4
# 1:  1    4     0 18 14 11 17
# 2:  1    8     0 15 20 15 14
# 3:  2    4     0 18 14 11 17
# 4:  2    8     0 15 20 15 14

Update

Here is a walk-through of the steps followed.

test
#     ID time Input X1 X2 X3 X4
#  1:  1    1     0 11  5 12  3
#  2:  1    2     0 15  4 17 10
#  3:  1    3     0  3 16 10 19
#  4:  1    4     0 18 14 11 17
#  5:  1    5     0 10 18  7  3
#  6:  1    6     1  2 16  3  6
#  7:  1    7     0  2  4  5  5
#  8:  1    8     0 15 20 15 14
#  9:  1    9     0 16 20 11  5
# 10:  1   10     0 14  5  6 11
# 11:  2    1     0 11  5 12  3
# 12:  2    2     0 15  4 17 10
# 13:  2    3     0  3 16 10 19
# 14:  2    4     0 18 14 11 17
# 15:  2    5     0 10 18  7  3
# 16:  2    6     1  2 16  3  6
# 17:  2    7     0  2  4  5  5
# 18:  2    8     0 15 20 15 14
# 19:  2    9     0 16 20 11  5
# 20:  2   10     0 14  5  6 11

Now generate a table of TRUE/FALSE values based upon being > 5

test_truth <- test[, signalcolumns, with = FALSE] > 5
test_truth
#          X1    X2    X3    X4
#  [1,]  TRUE FALSE  TRUE FALSE
#  [2,]  TRUE FALSE  TRUE  TRUE
#  [3,] FALSE  TRUE  TRUE  TRUE
#  [4,]  TRUE  TRUE  TRUE  TRUE
#  [5,]  TRUE  TRUE  TRUE FALSE
#  [6,] FALSE  TRUE FALSE  TRUE
#  [7,] FALSE FALSE FALSE FALSE
#  [8,]  TRUE  TRUE  TRUE  TRUE
#  [9,]  TRUE  TRUE  TRUE FALSE
# [10,]  TRUE FALSE  TRUE  TRUE
# [11,]  TRUE FALSE  TRUE FALSE
# [12,]  TRUE FALSE  TRUE  TRUE
# [13,] FALSE  TRUE  TRUE  TRUE
# [14,]  TRUE  TRUE  TRUE  TRUE
# [15,]  TRUE  TRUE  TRUE FALSE
# [16,] FALSE  TRUE FALSE  TRUE
# [17,] FALSE FALSE FALSE FALSE
# [18,]  TRUE  TRUE  TRUE  TRUE
# [19,]  TRUE  TRUE  TRUE FALSE
# [20,]  TRUE FALSE  TRUE  TRUE

Then, use apply over each row. The function to apply is all, which will return TRUE if all values applied to it are TRUE, and FALSE if any of the values are not true. Therefore, it will return TRUE for all rows where all of the values are TRUE.

truth_vect <- apply(test_truth, 1, all)
truth_vect
# [1] FALSE FALSE FALSE  TRUE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE  TRUE FALSE FALSE

This is the vector we need to use to subset the table.

test[truth_vect]
#    ID time Input X1 X2 X3 X4
# 1:  1    4     0 18 14 11 17
# 2:  1    8     0 15 20 15 14
# 3:  2    4     0 18 14 11 17
# 4:  2    8     0 15 20 15 14

3 Comments

I am not sure to understand it properly, but it is quite a compact syntax. Thanks you
@denis good point, I did not explain it at all. I've updated the answer to walk through the steps to hopefully make it more clear.
Thanks for the effort. I like it a lot, quite similar to akrun solution

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.