13

I want to filter my data frame based on a variable that may or may not exist. As an expected output, I want a df that is filtered (if it has the filter variable), or the original, unfiltered df (if the variable is missing).

Here is a minimal example:

library(tidyverse)
df1 <- 
tribble(~a,~b,
        1L,"a",
        0L, "a",
        0L,"b",
        1L, "b")
df2 <- select(df1, b)

Filtering on df1 returns the required result, a filtered tibble.

filter(df1, a == 1)
# A tibble: 2 x 2
      a     b
  <int> <chr>
1     1     a
2     1     b

But the second one throws an error (expectedly), as the variable is not in the df.

filter(df2, a == 1)
Error in filter_impl(.data, quo) : 
  Evaluation error: object 'a' not found.

I tried filter_at, which would be an obvious choice, but it throws an error if there is no variable that matches the predicament.

filter_at(df2, vars(matches("a")), any_vars(. == 1L))    
Error: `.predicate` has no matching columns

So, my question is: is there a way to create a conditional filtering that produces the expected outcome, preferably within the tidyverse?

8
  • 1
    I think this Q&A stackoverflow.com/questions/44001722/… should answer your question Commented Sep 12, 2017 at 14:07
  • 1
    What is the expected outcome? Commented Sep 12, 2017 at 14:07
  • 1
    For example (as in the linked Q), you can do stuff like df2 %>% filter(if("a" %in% names(.)) a == 1 else TRUE) or df2 %>% {if("a" %in% names(.)) filter(., a == 1) else .} Commented Sep 12, 2017 at 14:09
  • 1
    @JanvanderLaan OP wants the original data back if variable doesn't exist ("the original, unfiltered df (if the variable is missing). ") Commented Sep 12, 2017 at 14:19
  • 1
    can't you just wrap it in a try or tryCatch then ? or you want to be able to use it in pipe chains ? Commented Sep 12, 2017 at 15:06

2 Answers 2

14

As @docendo-discimus pointed out in the comments, the following solutions work. I also added rlang::has_name instead of "a" %in% names(.).

This Q&A contains the original idea: Conditionally apply pipeline step depending on external value.

df1 %>% 
   filter(if(has_name("a")) a == 1 else TRUE)
# A tibble: 2 x 2
      a     b
  <int> <chr>
1     1     a
2     1     b

df2 %>% 
   filter(if(has_name("a")) a == 1 else TRUE)
# A tibble: 4 x 1
      b
  <chr>
1     a
2     a
3     b
4     b

Or alternatively, by using {}:

df1 %>%
  {if(has_name("a")) filter(., a == 1L) else .} 
# A tibble: 2 x 2
      a     b
  <int> <chr>
1     1     a
2     1     b

> df2 %>%
+   {if(has_name("a")) filter(., a == 1L) else .}
# A tibble: 4 x 1
      b
  <chr>
1     a
2     a
3     b
4     b
Sign up to request clarification or add additional context in comments.

Comments

2

Something like this?

# function for expected output
foo <- function(x, y){
  tmp <- which(colnames(x) %in% y)
  if(length(tmp) > 0){
    filter(x, select(x, tmp) == 1)
  }else{
    df1
  }
}

# run the functions
foo(df1, "a")
foo(df2, "a")
# or

df1 %>% foo("a")
# A tibble: 2 x 2
      a     b
  <int> <chr>
1     1     a
2     1     b

df2 %>% foo("a")
# A tibble: 4 x 2
      a     b
  <int> <chr>
1     1     a
2     0     a
3     0     b
4     1     b

1 Comment

Thanks, it is a fine answer, but I was looking for a tidyverse solution.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.