5

I want to count the number of instances of some text (or factor level) row wise, across a subset of columns using dplyr.

Here's the input:

> input_df
  num_col_1 num_col_2 text_col_1 text_col_2
1         1         4        yes        yes
2         2         5         no        yes
3         3         6         no       <NA>

And here's the desired output:

> output_df
  num_col_1 num_col_2 text_col_1 text_col_2 sum_yes
1         1         4        yes        yes       2
2         2         5         no        yes       1
3         3         6         no       <NA>       0

In sum_yes we have counted the number of "yes" in that row.

I have tried two methods:

Attempted solution 1:

text_cols = c("text_col_1","text_col_2")
df = input_df %>% mutate(sum_yes = rowSums( select(text_cols) == "yes" ), na.rm = TRUE)

Errors with:

Error in mutate_impl(.data, dots) : 
  Evaluation error: no applicable method for 'select_' applied to an object of class "character".

Attempted solution 2:

text_cols = c("text_col_1","text_col_2")
df = input_df %>% select(text_cols) %>% rowsum("yes", na.rm = TRUE)

Errors with:

Error in rowsum.data.frame(., "yes", na.rm = TRUE) : 
  incorrect length for 'group'

2 Answers 2

10
  1. We can use mutate and take sum of number of "yes" for each row.
library(dplyr)    
df %>%  mutate(sum_yes = rowSums(.[text_cols] == "yes"))

#   num_col_1 num_col_2 text_col_1 text_col_2 sum_yes
#*     <int>     <int> <fct>      <fct>        <int>
#1         1         4 yes        yes              2
#2         2         5 no         yes              1
#3         3         6 no         <NA>             0

Inspired from this answer.

  1. rowwise with c_across :
df %>%
  rowwise() %>%
  mutate(sum_yes = sum(c_across(all_of(text_cols)) == "yes"))
  1. do with rowwise
df %>%
  rowwise() %>%
  do((.) %>% as.data.frame %>% 
  mutate(sum_yes = sum(.=="yes")))
  1. without do and rowwise
df %>%
 select(text_cols) %>%
 mutate(sum_yes = rowSums(. == "yes")) 
  1. In base R, it is actually more simple
df$sum_yes <- rowSums(df[text_cols] == "yes")
Sign up to request clarification or add additional context in comments.

4 Comments

I only want to operate on a subset of columns, won't the base R answer operate on ALL columns?
@RNs_Ghost you can select the columns of your choice. Updated the answer.
hi fellows, how can I do the same but replacing rowSums with newest dplyr's c_across? - I cannot make it work!
@JPV See the 2nd point in my updated answer.
1

We can also use reduce with map

library(tidyverse)
df %>% 
  select(text_cols) %>% 
  map(~ .x == "yes" & !is.na(.x)) %>% 
              reduce(`+`) %>%
  bind_cols(df, sum_yes = .)
#   num_col_1 num_col_2 text_col_1 text_col_2 sum_yes
#1         1         4        yes        yes       2
#2         2         5         no        yes       1
#3         3         6         no       <NA>       0

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.