0

I am building a Shiny app with plotly, and need to filter data on the basis of a number of parameters. Currently I am doing this with a flag in a data.table, updated by reference. The actual data have many columns, and I would vastly prefer an extensible way of adding columns to be visualised. I am coming up short in one area: the actual filtering of the data on the basis of values.

I store the names of the columns to be filtered in an array of characters, but it seems that I can't use this to define the expression by which rows are selected (i.e. the i expression). Is this possible? Or am I approaching this the wrong way?

library(data.table)

set.seed(12345)

dt = data.table(mtcars)
dt[,filtered := FALSE]


filterColumnNames = c('cyl','gear','carb')

filterValues = list(cyl = c(4,6),
                    gear = c(3),
                    carb = c(1))

for (columnName in filterColumnNames) {
  dt[columnName %in% filterValues[columnName][[1]], filtered := TRUE]
}

# Working, but not loopy enough.
# dt[cyl %in% filterValues['cyl'][[1]], filtered := TRUE]
# dt[gear %in% filterValues['gear'][[1]], filtered := TRUE]
# dt[carb %in% filterValues['carb'][[1]], filtered := TRUE]

print(dt)
1

3 Answers 3

2

Another way to achieve this is to use a join to select the rows:

library(data.table)
dt <- as.data.table(mtcars)
filterValues <- list(cyl = c(4,6),
                     gear = c(3),
                     carb = c(1))
dt[do.call(CJ, filterValues), on = names(filterValues), filtered := TRUE][]
     mpg cyl  disp  hp drat    wt  qsec vs am gear carb filtered
 1: 21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4       NA
 2: 21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4       NA
 3: 22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1       NA
 4: 21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1     TRUE
 5: 18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2       NA
 6: 18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1     TRUE
 7: 14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4       NA
 8: 24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2       NA
 9: 22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2       NA
10: 19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4       NA
11: 17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4       NA
12: 16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3       NA
13: 17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3       NA
14: 15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3       NA
15: 10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4       NA
16: 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4       NA
17: 14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4       NA
18: 32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1       NA
19: 30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2       NA
20: 33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1       NA
21: 21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1     TRUE
22: 15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2       NA
23: 15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2       NA
24: 13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4       NA
25: 19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2       NA
26: 27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1       NA
27: 26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2       NA
28: 30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2       NA
29: 15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4       NA
30: 19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6       NA
31: 15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8       NA
32: 21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2       NA
     mpg cyl  disp  hp drat    wt  qsec vs am gear carb filtered

or

dt <- as.data.table(mtcars)
dt[do.call(CJ, filterValues), on = names(filterValues), nomatch = 0L]
    mpg cyl  disp  hp drat    wt  qsec vs am gear carb
1: 21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
2: 21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
3: 18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1

You only need to specify the list of filterValues. do.call(CJ, filterValues) (cross join) creates a data.table with all combinations to select the rows by:

   cyl gear carb
1:   4    3    1
2:   6    3    1

Edit

The OP has asked if this could be extended to inequalities.

This can be done with data.table's non-equi joins but the setup is somewhat different. E.g.,

filterIntervals <- list(disp = c(200, 300),
                        mpg = c(10, 20))
mDT <- dcast(melt(filterIntervals), . ~ L1 + rowid(L1))
filterCondition <- c("disp>=disp_1", "disp<disp_2", "mpg>mpg_1", "mpg<mpg_2")
dt[mDT, on = filterCondition, filtered := TRUE][]
     mpg cyl  disp  hp drat    wt  qsec vs am gear carb filtered
 1: 21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4       NA
 2: 21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4       NA
 3: 22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1       NA
 4: 21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1       NA
 5: 18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2       NA
 6: 18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1     TRUE
 7: 14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4       NA
 8: 24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2       NA
 9: 22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2       NA
10: 19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4       NA
11: 17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4       NA
12: 16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3     TRUE
13: 17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3     TRUE
14: 15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3     TRUE
15: 10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4       NA
16: 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4       NA
17: 14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4       NA
18: 32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1       NA
19: 30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2       NA
20: 33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1       NA
21: 21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1       NA
22: 15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2       NA
23: 15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2       NA
24: 13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4       NA
25: 19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2       NA
26: 27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1       NA
27: 26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2       NA
28: 30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2       NA
29: 15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4       NA
30: 19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6       NA
31: 15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8       NA
32: 21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2       NA
     mpg cyl  disp  hp drat    wt  qsec vs am gear carb filtered
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks Uwe, your first solution is remarkably fast, which has made me rethink the architecture of the rest of the app. Would it be straightforward to apply this approach to inequalities?
2

The reason is the columnName before the %in% is not evaluated to get the value of that column. We can either use get

for (columnName in filterColumnNames) {
  dt[get(columnName) %in% filterValues[columnName][[1]], filtered := TRUE][]
}

or eval(as.name(

for (columnName in filterColumnNames) {
    dt[eval(as.name(columnName)) %in% filterValues[columnName][[1]], filtered := TRUE][]
}

Comments

1

You can create a character vector based on the filtering conditions you want to apply. See following example:

library(data.table)

d <- mtcars
setDT(d)
filtering_condition <- "cyl==6"
d[eval(parse(text=filtering_condition))]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.