Compute row-wise counts in subsets of columns in dplyr

Question

I want to count the number of instances of some text (or factor level) row wise, across a subset of columns using dplyr.

Here's the input:

> input_df
  num_col_1 num_col_2 text_col_1 text_col_2
1         1         4        yes        yes
2         2         5         no        yes
3         3         6         no       <NA>

And here's the desired output:

> output_df
  num_col_1 num_col_2 text_col_1 text_col_2 sum_yes
1         1         4        yes        yes       2
2         2         5         no        yes       1
3         3         6         no       <NA>       0

In sum_yes we have counted the number of "yes" in that row.

I have tried two methods:

Attempted solution 1:

text_cols = c("text_col_1","text_col_2")
df = input_df %>% mutate(sum_yes = rowSums( select(text_cols) == "yes" ), na.rm = TRUE)

Errors with:

Error in mutate_impl(.data, dots) : 
  Evaluation error: no applicable method for 'select_' applied to an object of class "character".

Attempted solution 2:

text_cols = c("text_col_1","text_col_2")
df = input_df %>% select(text_cols) %>% rowsum("yes", na.rm = TRUE)

Errors with:

Error in rowsum.data.frame(., "yes", na.rm = TRUE) : 
  incorrect length for 'group'

Ronak Shah · Accepted Answer · 2021-02-19 01:24:42Z

10

We can use mutate and take sum of number of "yes" for each row.

library(dplyr)    
df %>%  mutate(sum_yes = rowSums(.[text_cols] == "yes"))

#   num_col_1 num_col_2 text_col_1 text_col_2 sum_yes
#*     <int>     <int> <fct>      <fct>        <int>
#1         1         4 yes        yes              2
#2         2         5 no         yes              1
#3         3         6 no         <NA>             0

Inspired from this answer.

rowwise with c_across :

df %>%
  rowwise() %>%
  mutate(sum_yes = sum(c_across(all_of(text_cols)) == "yes"))

do with rowwise

df %>%
  rowwise() %>%
  do((.) %>% as.data.frame %>% 
  mutate(sum_yes = sum(.=="yes")))

without do and rowwise

df %>%
 select(text_cols) %>%
 mutate(sum_yes = rowSums(. == "yes"))

In base R, it is actually more simple

df$sum_yes <- rowSums(df[text_cols] == "yes")

edited Feb 19, 2021 at 1:24

answered Aug 10, 2018 at 9:26

Ronak Shah

391k20 gold badges173 silver badges237 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

RNs_Ghost Over a year ago

I only want to operate on a subset of columns, won't the base R answer operate on ALL columns?

Ronak Shah Over a year ago

@RNs_Ghost you can select the columns of your choice. Updated the answer.

JPV Over a year ago

hi fellows, how can I do the same but replacing rowSums with newest dplyr's c_across? - I cannot make it work!

Ronak Shah Over a year ago

@JPV See the 2nd point in my updated answer.

akrun · Accepted Answer · 2018-08-10 14:00:35Z

1

We can also use reduce with map

library(tidyverse)
df %>% 
  select(text_cols) %>% 
  map(~ .x == "yes" & !is.na(.x)) %>% 
              reduce(`+`) %>%
  bind_cols(df, sum_yes = .)
#   num_col_1 num_col_2 text_col_1 text_col_2 sum_yes
#1         1         4        yes        yes       2
#2         2         5         no        yes       1
#3         3         6         no       <NA>       0

answered Aug 10, 2018 at 14:00

akrun

891k38 gold badges590 silver badges700 bronze badges

Collectives™ on Stack Overflow

Compute row-wise counts in subsets of columns in dplyr

2 Answers 2

4 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related