0

I have a data frame similar to this:

> var1<-c("01","01","01","02","02","02","03","03","03","04","04","04")
> var2<-c("0","4","6","8","3","2","5","5","7","7","8","9")
> var3<-c("07","41","60","81","38","22","51","53","71","72","84","97")
> var4<-c("107","241","360","181","238","222","351","453","171","372","684","197")
> df<-data.frame(var1,var2,var3,var4)
> df
   var1 var2 var3 var4
1    01    0   07  107
2    01    4   41  241
3    01    6   60  360
4    02    8   81  181
5    02    3   38  238
6    02    2   22  222
7    03    5   51  351
8    03    5   53  453
9    03    7   71  171
10   04    7   72  372
11   04    8   84  684
12   04    9   97  197

I want to replace all values of the variables var2,var3,var4 with "0" that exist where var1 is 02 and/or 03. The digit number also needs to be the same so that df looks like this:

   var1 var2 var3 var4
1    01    0   07  107
2    01    4   41  241
3    01    6   60  360
4    02    0   00  000
5    02    0   00  000
6    02    0   00  000
7    03    0   00  000
8    03    0   00  000
9    03    0   00  000
10   04    7   72  372
11   04    8   84  684
12   04    9   97  197

Now, I also need to be sure the command will be executed, even if var1 would not contain 02 or 03. Basically something like if var1 contains 01 or 02 set the corresponding values in var2,var3 and var4 to 0 according to the number of digits in var2,var3 and var4 (e.g. 97 will be 00 and 197 will be 000) and if not, do nothing.

Any suggestions?

3 Answers 3

1

One solution is to use mutate and case_when from dplyr

library(dplyr)

df <- df %>%
  mutate(var2 = case_when(var1 %in% c('02','03') ~ '0',
                          TRUE ~ as.character(var2)),
         var3 = case_when(var1 %in% c('02','03') ~ '00',
                          TRUE ~ as.character(var3)),
         var4 = case_when(var1 %in% c('02','03') ~ '000',
                          TRUE ~ as.character(var4)))
Sign up to request clarification or add additional context in comments.

1 Comment

Exactly what I needed. Thank you!
1

Here is an idea where we can do this dynamically for any number of columns, for any number of digits. The trick is to make sure you have character variables (instead of factors) and use sprintf based on the maximum nchar of each column, i.e.

#Convert to character (IF they are factors)
df[] <- lapply(df, as.character)
#Convert values to 0 as per your condition
df[df$var1 %in% c('02', '03'), -1] <- 0
#Add leading 0s to bring to same format as original
df[-1] <- mapply(function(x, y){i1 <- sprintf(paste0('%0', x, 's'), y); gsub(' ', '0', i1)}, 
                               sapply(df[-1], function(i)max(nchar(i))), df[-1])

which gives,

   var1 var2 var3 var4
1    01    0   07  107
2    01    4   41  241
3    01    6   60  360
4    02    0   00  000
5    02    0   00  000
6    02    0   00  000
7    03    0   00  000
8    03    0   00  000
9    03    0   00  000
10   04    7   72  372
11   04    8   84  684
12   04    9   97  197

2 Comments

Interesting suggestion. I can´t apply this to my original data (there are more columns that shouldn´t be converted to 0 but of course, you could not know from my question), but I think this can be very useful in future. Thanks!
You can exclude unwanted columns. For example, in this case I exclude the first column (that's why I have the -1, i.e. df[-1]). If you replace the -1 above with the index of columns you want to exclude, it should work just fine
1

If you want it to automatically make as many zeros as there are digits in the variable you can use something like this

# define a function
val_to_zero <- function(con, val){ifelse(con, paste0(rep(0,unique(nchar(as.character(val)))), collapse=""),val)}
# define the condition
con <- df$var1 %in% c("01", "02")
# choose which columns to change
vars <- names(df)[2:4]
# apply the function to columns    
df[ , vars] <- do.call("cbind.data.frame", lapply(df[, vars],function(var_i){val_to_zero(con, var_i)}))
# done
df

For this function you do not need to tell by hand how many zeros to use for what column. So if var5 is c("292992", ...) it still works.

2 Comments

Even with just two conditions, you may want to consider df$var1 %in% c("01", "02") rather than df$var1 == "01" | df$var1 == "y". It's less typing and extends more easily if there are more cases.
I find the solution with dplyr from @camnesia a bit more straightforward. But also thanks for this suggestion. Maybe other users prefer this option.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.