1

Suppose I have a data frame that looks like this:

fact_code style_serial ss rib button rib_s button_s
1008      style_1018   1   0  0      1     1 
1008      style_1018   0   1  0      1     1
1008      style_1018   0   1  0      1     1
1008      style_1018   0   0  1      1     1 
1008      style_1003   1   0  1      0     1
1008      style_1003   0   0  1      0     1
1008      style_1003   0   0  0      0     1
1008      style_1003   0   0  0      0     1
1004      style_1197   1   0  0      1     0 
1004      style_1197   0   0  0      1     0
1004      style_1197   0   0  0      1     0
1004      style_1197   0   1  0      1     0

The key variables, rib and button are dummy variables. They indicate whether a particular garment style produced by a factory has rib or button or both. I then want to take the maximum of these dummy variables grouped by fact_code and style_serial and in this case I name them as rib_s and button_s.

The variables rib_s and button_s were generated as follows:

df <- df %>% group_by(fact_code, style_serial) %>% mutate(rib_s = max(rib, na.rm = TRUE))
df <- df %>% group_by(fact_code, style_serial) %>% mutate(button_s = max(button, na.rm = TRUE))

Now suppose that I have around 20 such variables. I wanted to create a loop that runs as many times as number of variables and each time executes the above code for each of the 20 dummy variables.

I have tried this for the 2 variables as a test:

for (xx in c("rib", "button")){
df <- df %>%
group_by_(fact_code, style_serial) %>%
yy <- paste0(c(xx, "s"), collapse = "_") %>%
mutate_(yy = max(xx, na.rm = TRUE))
}

But it gives me the following error message:

Error in UseMethod("mutate_") : no applicable method for 'mutate_' applied to an object of class "character"

I have also tried base r functions for example tapply and aggregate but always getting some error messages.

Do you have a way to get round this problem?

1 Answer 1

2

This can be solved very succinctly using dplyr::mutate_at:

library(dplyr)
key <- c("rib", "button")
df %>%
    group_by(fact_code, style_serial) %>%
    mutate_at(vars(key), funs(max = max(.)))
## A tibble: 12 x 9
## Groups:   fact_code, style_serial [3]
#   fact_code style_serial    ss   rib button rib_s button_s rib_max button_max
#       <int> <fct>        <int> <int>  <int> <int>    <int>   <dbl>      <dbl>
# 1      1008 style_1018       1     0      0     1        1      1.         1.
# 2      1008 style_1018       0     1      0     1        1      1.         1.
# 3      1008 style_1018       0     1      0     1        1      1.         1.
# 4      1008 style_1018       0     0      1     1        1      1.         1.
# 5      1008 style_1003       1     0      1     0        1      0.         1.
# 6      1008 style_1003       0     0      1     0        1      0.         1.
# 7      1008 style_1003       0     0      0     0        1      0.         1.
# 8      1008 style_1003       0     0      0     0        1      0.         1.
# 9      1004 style_1197       1     0      0     1        0      1.         0.
#10      1004 style_1197       0     0      0     1        0      1.         0.
#11      1004 style_1197       0     0      0     1        0      1.         0.
#12      1004 style_1197       0     1      0     1        0      1.         0.

This automatically calculates the maximum of values (per group) for variables given in key, and creates new columns by appending _max to the corresponding column name. Note that you can also use the usual select semantics (e.g. contains, matches, starts_with, ends_with etc.) within vars(...) if you don't want to (or can't) define key beforehand.


Sample data

df <- read.table(text =
    "fact_code style_serial ss rib button rib_s button_s
1008      style_1018   1   0  0      1     1
1008      style_1018   0   1  0      1     1
1008      style_1018   0   1  0      1     1
1008      style_1018   0   0  1      1     1
1008      style_1003   1   0  1      0     1
1008      style_1003   0   0  1      0     1
1008      style_1003   0   0  0      0     1
1008      style_1003   0   0  0      0     1
1004      style_1197   1   0  0      1     0
1004      style_1197   0   0  0      1     0
1004      style_1197   0   0  0      1     0
1004      style_1197   0   1  0      1     0", header = T)
Sign up to request clarification or add additional context in comments.

1 Comment

key <- c("rib", "button") names(key) <- paste0(key,'_s') df <- df %>% group_by(fact_code, style_serial) %>% mutate_at(.vars = key, funs(max(., na.rm = T)))

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.