In R, how can I change many select (binary) columns in a dataframe into factors?

Question

I have a dataset with many columns and I'd like to locate the columns that have fewer than n unique responses and change just those columns into factors.

Here is one way I was able to do that:

#create sample dataframe
df <- data.frame("number" = c(1,2.7,8,5), "binary1" = c(1,0,1,1), 
"answer" = c("Yes","No", "Yes", "No"), "binary2" = c(0,0,1,0))
n <- 3

#for each column
for (col in colnames(df)){
#check if the first entry is numeric
  if (is.numeric(df[col][1,1])){
# check that there are fewer than 3 unique values
    if ( length(unique(df[col])[,1]) < n ) {
    df[[col]] <- factor(df[[col]])
                                           }
                               }
                         }

What is another, hopefully more succinct, way of accomplishing this?

akrun · Accepted Answer · 2021-06-15 20:29:36Z

6

Here is a way using tidyverse.

We can make use of where within across to select the columns with logical short-circuit expression where we check

the columns are numeric - (is.numeric)
if the 1 is TRUE, check whether number of distinct elements less than the user defined n
if 2 is TRUE, then check all the unique elements in the column are 0 and 1
loop over those selected column and convert to factor class

library(dplyr)
df1 <- df %>% 
     mutate(across(where(~is.numeric(.) && 
                           n_distinct(.) < n && 
                           all(unique(.) %in% c(0, 1))),  factor))

-checking

str(df1)
'data.frame':   4 obs. of  4 variables:
 $ number : num  1 2.7 8 5
 $ binary1: Factor w/ 2 levels "0","1": 2 1 2 2
 $ answer : chr  "Yes" "No" "Yes" "No"
 $ binary2: Factor w/ 2 levels "0","1": 1 1 2 1

edited Jun 15, 2021 at 20:29

answered Jun 15, 2021 at 20:23

akrun

891k38 gold badges590 silver badges700 bronze badges

Sign up to request clarification or add additional context in comments.

18 Comments

akrun Over a year ago

@GregorThomas Suppose the column have only 0 or 1 alone, I am not sure if the OP wanted to convert those to factor. Also, unique values can be 2, 3 or 4 or 5. I guess the OP is specifically looking for those binary

Gregor Thomas Over a year ago

Despite their use of "binary" a couple times, the only check OP is attempting in the question is the n_distinct. The "binary" columns may just be an example.

Gregor Thomas Over a year ago

Though you've explained your steps well enough that they should be able to adapt as needed.

akrun Over a year ago

@Mark The ~ is a compact lambda function in tidyverse which is similar to function(x). The default value here is . or .x i.e. the column value. In base R, with new release you can use \(x) x as a compact form

akrun Over a year ago

@Mark you may have noticed that where(is.numeric) in select. Here, I use lambda expression because there are multiple conditions joined by &&. So, either use ~ is.numeric(.) && or function(x) is.numeric(x) &&. The lambda expressions are commonly used in lapply/sapply/apply based functions in base R as well where the x is the column value for that particular column looped

|

Anoushiravan R · Accepted Answer · 2021-06-16 11:30:12Z

2

You can also use imap function to great advantage in this case. A thousand thanks to my dear friend @akrun who never ceases to inspire us:

library(dplyr)
library(purrr)

n <- 3

df %>% 
  imap_dfc(~ if(is.numeric(.x) & length(unique((.x)) < n) 
                & all(unique(.x) %in% c(0, 1))) {
    factor(df[[.y]])
    }  else {
      df[[.y]]
  }
)

# A tibble: 4 x 4
  number binary1 answer binary2
   <dbl> <fct>   <chr>  <fct>  
1    1   1       Yes    0      
2    2.7 0       No     0      
3    8   1       Yes    1      
4    5   1       No     0

edited Jun 16, 2021 at 11:30

answered Jun 15, 2021 at 23:13

Anoushiravan R

22k3 gold badges22 silver badges44 bronze badges

Comments

ThomasIsCoding · Accepted Answer · 2021-06-15 20:56:21Z

1

A base R option

out <- list2DF(
    lapply(
        df,
        function(x) {
            if (length(unique(x)) < n & all(x %in% c(0, 1))) as.factor(x) else x
        }
    )
)

gives

> str(out)
'data.frame':   4 obs. of  4 variables:
 $ number : num  1 2.7 8 5
 $ binary1: Factor w/ 2 levels "0","1": 2 1 2 2
 $ answer : chr  "Yes" "No" "Yes" "No"
 $ binary2: Factor w/ 2 levels "0","1": 1 1 2 1

answered Jun 15, 2021 at 20:56

ThomasIsCoding

106k9 gold badges38 silver badges110 bronze badges

Collectives™ on Stack Overflow

In R, how can I change many select (binary) columns in a dataframe into factors?

3 Answers 3

18 Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

18 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related