0

I'm looking for a solution in dplyr for the task of selecting columns of a dataframe based on multiple conditions. Say, we have this type of df:

X <- c("B", "C", "D", "E")
a1 <- c(1, 0, 3, 0)
a2 <- c(235, 270, 100, 1)
a3 <- c(3, 1000, 900, 2)
df1 <- data.frame(X, a1, a2, a3)

Let's further assume I want to select that column/those columns that are

  • (i) numeric
  • (ii) where all values are smaller than 5

That is, in this case, what we want to select is column a1. How can this be done in dplyr? My understanding is that in order to select a column in dplyr you use select and, if that selection is governed by conditions, also where. But how to combine two such select(where...) statements? This, for example, is not the right way to do it as it throws an error:

df1 %>%
  select(where(is.numeric) & where(~ all(.) < 5))
Error: `where()` must be used with functions that return `TRUE` or `FALSE`.
Run `rlang::last_error()` to see where the error occurred.
In addition: Warning message:
In all(.) : coercing argument of type 'character' to logical
3
  • Remarkably similar to this, posted earlier today. I think the answers there will help you here. Commented Jun 3, 2022 at 11:41
  • No wonder it's similar as the present Q is a spin-off my attempts at finding a solution for that other Q. But no, there's no help there for this Q Commented Jun 3, 2022 at 11:43
  • Ah! OK. I did wonder... By the way, df1 %>% select(where(\(.) is.numeric(.) & all(.) <5)) removes the error, but gives the wrong answer. :=( I was on the right line, but @benson23 beat me to the answer. Commented Jun 3, 2022 at 11:44

2 Answers 2

3

Inside where, we need to supply functions that have logical results.

library(dplyr)

select(df1, \(x) all(x < 5))

# or this, which might be more semantically correct
select(df1, where(\(x) is.numeric(x) & all(x < 5)))

  a1
1  1
2  0
3  3
4  0

Data

df1 <- structure(list(X = c("B", "C", "D", "E"), a1 = c(1, 0, 3, 0), 
    a2 = c(235, 270, 100, 1), a3 = c(3, 1000, 900, 2)), class = "data.frame", row.names = c(NA, 
-4L))
Sign up to request clarification or add additional context in comments.

5 Comments

Cool, thanks a lot. Is there a way to shorten the code a little by omitting function(x) and stuff? I've tried it myself but can't get it to work...
@ChrisRuehlemann Actually yes, we only need one all function without where
Why wont you use the formula notation for function? ie where(~is.numeric(.) & all(. < 5)) ?
@onyambu Thank you for pointing this out! I didn't notice this until now, when I'm using across, I would use the ~ syntax, but in where, I somehow always use function(x)... It's only my personal habit and I guess I would use ~ in where in the future, thanks again!
you can use ~ in all tidyverse functions. You are not limited to only across and where
2

Another possible solution, based on dplyr::mutate:

library(dplyr)

df1 %>% 
  mutate(across(everything(), ~ if (all(.x < 5) & is.numeric(.x)) .x))

#>   a1
#> 1  1
#> 2  0
#> 3  3
#> 4  0

Or even more shortly:

df1 %>% 
  mutate(across(everything(), ~ if (all(.x < 5)) .x))

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.