How to use a vector for creating a logical expression for subsetting a data frame?

Question

I am trying to use a vector of logical expressions to subset a data frame. I have a data frame I want to subset based on several columns where I want to exclude "B" each time. First I want do define a vector for logical expressions based on data frame column names.

set.seed(42) 
n <- 24
dataframe <- data.frame(column1=as.character(factor(paste("obs",1:n))),
                        rand1=rep(LETTERS[1:4], n/4), 
                        rand2=rep(LETTERS[1:6], n/6), 
                        rand3=rep(LETTERS[1:3], n/3), 
                        x=rnorm(n))
columns <- colnames(dataframe)[2:4]
criteria <- quote(rep(paste0(columns[1:3], " != ", quote("B")), length(columns)))

What I want to achieve is a vector criteria containing rand1 != "B" rand2 != "B" rand3 != "B" so I can use it to subset data frame based on columns like

dfs1 <- subset(dataframe, criteria[1])
dfs2 <- subset(dataframe, criteria[2])
dfs3 <- subset(dataframe, criteria[3])

Would be using filter an option for you? E.g. dataframe %>% dplyr::filter(rand1 != "B", rand2 != "B", rand3 != "B") — Julian
– Julian, Commented Apr 27, 2022 at 8:43
That may work, but anyway I have to write all the conditions manually. The example has only three but I have much more. That's why I want to automate that — mschmidt
– mschmidt, Commented Apr 27, 2022 at 8:46

jlhoward · Accepted Answer · 2022-04-28 09:34:51Z

2

I might be misunderstanding your question, but it seems like you want a collection of data.frames where each one excludes rows where a given column = 'B'.

Assuming this is what you want:

cols <- c('rand1', 'rand2', 'rand3')              
result <- lapply(dataframe[, cols], function(x) dataframe[x!='B',])

will create a list of data.frames, each of which has the result of excluding rows where the indicated column == 'B'.

answered Apr 28, 2022 at 9:34

jlhoward

59.6k7 gold badges105 silver badges144 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Julian · Accepted Answer · 2022-04-27 16:15:35Z

Based on Using tidy eval for multiple, arbitrary filter conditions

filter_fun <- function(df, cols, conds){     
  fp <- map2(cols, conds, function(x, y) quo((!!(as.name(x))) != !!y))
  filter(df, !!!fp)
}

filter_col <-  columns[1:3] %>% as.list()
cond_list <- rep(list("B"), length(columns[1:3]))

filter_fun(dataframe, cols = filter_col, 
          conds = cond_list)


  column1 rand1 rand2 rand3          x
1    obs 1     A     A     A  1.3709584
2    obs 3     C     C     C  0.3631284
3    obs 4     D     D     A  0.6328626
4    obs 7     C     A     A  1.5115220
5    obs 9     A     C     C  2.0184237
6   obs 12     D     F     C  2.2866454
7   obs 13     A     A     A -1.3888607
8   obs 15     C     C     C -0.1333213
9   obs 16     D     D     A  0.6359504
10  obs 19     C     A     A -2.4404669
11  obs 21     A     C     C -0.3066386
12  obs 24     D     F     C  1.2146747

Collectives™ on Stack Overflow

How to use a vector for creating a logical expression for subsetting a data frame?

2 Answers 2

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related