2

I would like to know if there is an elegant and concise way to do conditional filtering with data.table.

My aim is the following: if condition 1 is met, filter based on condition 2.

For instance, in the case of the iris dataset, how can I drop the observations among Species=="setosa" where Sepal.Length<5.5, while keeping all observations with Sepal.Length<5.5 for other species?

I know how to do this in steps, but I wonder if there is a better way to do it in a single liner

# this is how I would do it in steps. 

data("iris")

# first only select observations in setosa I am interested in keeping 
iris1<- setDT(iris)[Sepal.Length>=5.5&Species=="setosa"] 

# second, drop all of setosa observations. 
iris2<- setDT(iris)[Species!="setosa"] 

# join data,
iris_final<-full_join(iris1,iris2)

head(iris_final)
   Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
1:          5.8         4.0          1.2         0.2     setosa
2:          5.7         4.4          1.5         0.4     setosa
3:          5.7         3.8          1.7         0.3     setosa
4:          5.5         4.2          1.4         0.2     setosa
5:          5.5         3.5          1.3         0.2     setosa # only keeping setosa with Sepal.Length>=5.5. Note that for other species, Sepal.Length can be <5.5
6:          7.0         3.2          4.7         1.4 versicolor

is there a more concise and elegant way of doing this?

2 Answers 2

4

Is something like the following what you are looking for? It is not very clear what you want.

library(data.table)

dt <- data.table(iris)
dt[Sepal.Length >= 5.5 & Species == "setosa" | Species != "setosa"]

#>      Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
#>   1:          5.8         4.0          1.2         0.2    setosa
#>   2:          5.7         4.4          1.5         0.4    setosa
#>   3:          5.7         3.8          1.7         0.3    setosa
#>   4:          5.5         4.2          1.4         0.2    setosa
#>   5:          5.5         3.5          1.3         0.2    setosa
#>  ---                                                            
#> 101:          6.7         3.0          5.2         2.3 virginica
#> 102:          6.3         2.5          5.0         1.9 virginica
#> 103:          6.5         3.0          5.2         2.0 virginica
#> 104:          6.2         3.4          5.4         2.3 virginica
#> 105:          5.9         3.0          5.1         1.8 virginica
Sign up to request clarification or add additional context in comments.

Comments

2

You can use the | or operator:

This is asking to remove any lines where Species=="setosa" & Sepal.Length<5.5 and keep lines where Sepal.Length>5.5

iris1[!(Species=="setosa" & Sepal.Length<5.5) | Sepal.Length>5.5]
     Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
  1:          5.8         4.0          1.2         0.2    setosa
  2:          5.7         4.4          1.5         0.4    setosa
  3:          5.7         3.8          1.7         0.3    setosa
  4:          5.5         4.2          1.4         0.2    setosa
  5:          5.5         3.5          1.3         0.2    setosa
 ---                                                            
101:          6.7         3.0          5.2         2.3 virginica
102:          6.3         2.5          5.0         1.9 virginica
103:          6.5         3.0          5.2         2.0 virginica
104:          6.2         3.4          5.4         2.3 virginica
105:          5.9         3.0          5.1         1.8 virginica

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.