Replace values in multiple columns by values of another column on condition

Question

I have a data frame similar to this:

> var1<-c("01","01","01","02","02","02","03","03","03","04","04","04")
> var2<-c("0","4","6","8","3","2","5","5","7","7","8","9")
> var3<-c("07","41","60","81","38","22","51","53","71","72","84","97")
> var4<-c("107","241","360","181","238","222","351","453","171","372","684","197")
> df<-data.frame(var1,var2,var3,var4)
> df
   var1 var2 var3 var4
1    01    0   07  107
2    01    4   41  241
3    01    6   60  360
4    02    8   81  181
5    02    3   38  238
6    02    2   22  222
7    03    5   51  351
8    03    5   53  453
9    03    7   71  171
10   04    7   72  372
11   04    8   84  684
12   04    9   97  197

I want to replace all values of the variables var2,var3,var4 with "0" that exist where var1 is 02 and/or 03. The digit number also needs to be the same so that df looks like this:

   var1 var2 var3 var4
1    01    0   07  107
2    01    4   41  241
3    01    6   60  360
4    02    0   00  000
5    02    0   00  000
6    02    0   00  000
7    03    0   00  000
8    03    0   00  000
9    03    0   00  000
10   04    7   72  372
11   04    8   84  684
12   04    9   97  197

Now, I also need to be sure the command will be executed, even if var1 would not contain 02 or 03. Basically something like if var1 contains 01 or 02 set the corresponding values in var2,var3 and var4 to 0 according to the number of digits in var2,var3 and var4 (e.g. 97 will be 00 and 197 will be 000) and if not, do nothing.

Any suggestions?

camnesia · Accepted Answer · 2020-03-13 14:16:58Z

1

One solution is to use mutate and case_when from dplyr

library(dplyr)

df <- df %>%
  mutate(var2 = case_when(var1 %in% c('02','03') ~ '0',
                          TRUE ~ as.character(var2)),
         var3 = case_when(var1 %in% c('02','03') ~ '00',
                          TRUE ~ as.character(var3)),
         var4 = case_when(var1 %in% c('02','03') ~ '000',
                          TRUE ~ as.character(var4)))

answered Mar 13, 2020 at 14:16

camnesia

2,35323 silver badges27 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Lutz Over a year ago

Exactly what I needed. Thank you!

Sotos · Accepted Answer · 2020-03-13 15:00:15Z

1

Here is an idea where we can do this dynamically for any number of columns, for any number of digits. The trick is to make sure you have character variables (instead of factors) and use sprintf based on the maximum nchar of each column, i.e.

#Convert to character (IF they are factors)
df[] <- lapply(df, as.character)
#Convert values to 0 as per your condition
df[df$var1 %in% c('02', '03'), -1] <- 0
#Add leading 0s to bring to same format as original
df[-1] <- mapply(function(x, y){i1 <- sprintf(paste0('%0', x, 's'), y); gsub(' ', '0', i1)}, 
                               sapply(df[-1], function(i)max(nchar(i))), df[-1])

which gives,

   var1 var2 var3 var4
1    01    0   07  107
2    01    4   41  241
3    01    6   60  360
4    02    0   00  000
5    02    0   00  000
6    02    0   00  000
7    03    0   00  000
8    03    0   00  000
9    03    0   00  000
10   04    7   72  372
11   04    8   84  684
12   04    9   97  197

answered Mar 13, 2020 at 15:00

Sotos

51.6k6 gold badges36 silver badges69 bronze badges

2 Comments

Lutz Over a year ago

Interesting suggestion. I can´t apply this to my original data (there are more columns that shouldn´t be converted to 0 but of course, you could not know from my question), but I think this can be very useful in future. Thanks!

Sotos Over a year ago

You can exclude unwanted columns. For example, in this case I exclude the first column (that's why I have the -1, i.e. df[-1]). If you replace the -1 above with the index of columns you want to exclude, it should work just fine

score 1 · Accepted Answer · 2020-03-13 15:49:08Z

1

If you want it to automatically make as many zeros as there are digits in the variable you can use something like this

# define a function
val_to_zero <- function(con, val){ifelse(con, paste0(rep(0,unique(nchar(as.character(val)))), collapse=""),val)}
# define the condition
con <- df$var1 %in% c("01", "02")
# choose which columns to change
vars <- names(df)[2:4]
# apply the function to columns    
df[ , vars] <- do.call("cbind.data.frame", lapply(df[, vars],function(var_i){val_to_zero(con, var_i)}))
# done
df

For this function you do not need to tell by hand how many zeros to use for what column. So if var5 is c("292992", ...) it still works.

edited Mar 13, 2020 at 15:49

answered Mar 13, 2020 at 14:51

user11538509

2 Comments

Gregor Thomas Over a year ago

Even with just two conditions, you may want to consider df$var1 %in% c("01", "02") rather than df$var1 == "01" | df$var1 == "y". It's less typing and extends more easily if there are more cases.

Lutz Over a year ago

I find the solution with dplyr from @camnesia a bit more straightforward. But also thanks for this suggestion. Maybe other users prefer this option.

Collectives™ on Stack Overflow

Replace values in multiple columns by values of another column on condition

3 Answers 3

1 Comment

2 Comments

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

2 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related