Conditionally replace values of multiple columns, from values of other multiple columns

Question

Suppose I have this dataset:

set.seed (1234); 
data.frame(cbind(a=rep(c("si","no"),30),b=rnorm(60)),
           c=rep(c("d","e","f"),20))  %>% head()

Then I want to add many columns (in this example I only added two), to identify distinct cases between each group (in this case, column "a").

set.seed(1234); 
data.frame(cbind(a=rep(c("si","no"),30),b=rnorm(60)),c=rep(c("d","e","f"),20)) %>% 
group_by(a) %>% dplyr::mutate_at(vars(c(b,c)), .funs= list(dups_hash_ing= ~n_distinct(.)))

This code leaves the following dataset:

If I set the dataset with dput, the outcome is

structure(list(a = structure(c(2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 
2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 
2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 
2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 
2L, 1L, 2L, 1L), .Label = c("no", "si"), class = "factor"), b = structure(c(22L, 
1L, 51L, 34L, 50L, 57L, 53L, 10L, 47L, 3L, 11L, 23L, 15L, 38L, 
58L, 39L, 41L, 17L, 28L, 21L, 37L, 45L, 29L, 46L, 32L, 48L, 56L, 
52L, 26L, 19L, 35L, 8L, 55L, 20L, 9L, 36L, 2L, 12L, 6L, 42L, 
49L, 43L, 59L, 54L, 31L, 13L, 60L, 44L, 14L, 30L, 7L, 5L, 16L, 
27L, 33L, 18L, 24L, 4L, 25L, 40L), .Label = c("-0.0997905884418961", 
"-0.151736536534977", "-0.198416273822079", "-0.254874652654534", 
"-0.274704218225806", "-0.304721068966714", "-0.324393300483657", 
"-0.400235237343163", "-0.415751788401515", "-0.50873701541522", 
"-0.538070788884863", "-0.60615111526422", "-0.659770093821306", 
"-0.684320344136007", "-0.789646852263761", "-0.933503340589868", 
"-0.965903210133575", "-1.07754212275943", "-1.11444896479736", 
"-1.60708093984972", "-2.07823754188738", "-2.7322195229558", 
"-2.85575865501923", "-3.23315213292314", "0.0295178303214797", 
"0.0326639575014441", "0.116845344986082", "0.162654708118265", 
"0.185513915583057", "0.186492083080971", "0.287709728313787", 
"0.311681028661359", "0.319160238648117", "0.413868915451097", 
"0.418057822385083", "0.42200837321742", "0.485226820569252", 
"0.487814635163685", "0.500694614280786", "0.594273774110513", 
"0.62021020366732", "0.629536099884472", "0.660212631820405", 
"0.677415500438328", "0.696768778564913", "0.700733515544461", 
"0.704180178465512", "0.760462361967838", "0.895171980275539", 
"0.912322161610113", "0.976031734922396", "1.1123628412626", 
"1.16910851401363", "1.17349757263239", "1.49349310261748", "1.84246362620766", 
"1.98373220068438", "2.16803253951933", "2.27348352044748", "2.91914013071762"
), class = "factor"), c = structure(c(1L, 2L, 3L, 1L, 2L, 3L, 
1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 
3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 
1L, 2L, 3L, 1L, 2L, 3L), .Label = c("d", "e", "f"), class = "factor"), 
    a_dups_hash_ing = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L), b_dups_hash_ing = c(30L, 30L, 30L, 30L, 
    30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 
    30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 
    30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 
    30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 
    30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L), c_dups_hash_ing = c(3L, 
    3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
    3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
    3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
    3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L)), class = c("grouped_df", 
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -60L), groups = structure(list(
    a = structure(1:2, .Label = c("no", "si"), class = "factor"), 
    .rows = list(c(2L, 4L, 6L, 8L, 10L, 12L, 14L, 16L, 18L, 20L, 
    22L, 24L, 26L, 28L, 30L, 32L, 34L, 36L, 38L, 40L, 42L, 44L, 
    46L, 48L, 50L, 52L, 54L, 56L, 58L, 60L), c(1L, 3L, 5L, 7L, 
    9L, 11L, 13L, 15L, 17L, 19L, 21L, 23L, 25L, 27L, 29L, 31L, 
    33L, 35L, 37L, 39L, 41L, 43L, 45L, 47L, 49L, 51L, 53L, 55L, 
    57L, 59L))), row.names = c(NA, -2L), class = c("tbl_df", 
"tbl", "data.frame"), .drop = TRUE))

What I need to do, is replace, column by column, if the number of distinct cases is more than one per group, with the value of the original column. I have to do this for more than 50 columns. An example of this will be provided for only one column with mutate:

  dplyr::mutate(b_dups_hash_ing= ifelse(>1,b,0))

I need to repeat the code provided above for many variables. This is very similar to a mutate_at (words in brackets is what I would do). The following example does not work, but is something I would do in an ideal world, just for your better understanding of my problem.

dplyr::mutate_at(vars(contains('_dups_hash_ing')), .funs = list(~ifelse(.>1,vars([original]),0)))

Please make this reproducible by setting a seed with set.seed([some number here]) — JasonAizkalns
– JasonAizkalns, Commented Feb 20, 2020 at 20:32
Thank you for your response. However, the output of the example is not relevant. I'm particularly interested in the function that makes it possible to replace values in the columns. I will do my best trying to edit the question, I don't know how to do it. — Andrés González
– Andrés González, Commented Feb 20, 2020 at 20:51
@JasonAizkalns I did what you asked. I added the seed. Thank you. — Andrés González
– Andrés González, Commented Feb 20, 2020 at 20:58
Try using dput(dataset), where dataset might be a dataframe. — massisenergy
– massisenergy, Commented Feb 20, 2020 at 20:59
That's not me, just tried to make the question more comprehensible, with the suggestion. — massisenergy
– massisenergy, Commented Feb 20, 2020 at 22:03

pgcudahy · Accepted Answer · 2020-02-21 08:23:27Z

2

Is this what you're looking for?

df %>% dplyr::mutate_at(vars(contains('_dups_hash_ing')), ~ ifelse(. > 1, ., 0)) %>% head
#> # A tibble: 6 x 6
#> # Groups:   a [2]
#>   a     b                  c     a_dups_hash_ing b_dups_hash_ing c_dups_hash_ing
#>   <fct> <fct>              <fct>           <dbl>           <int>           <int>
#> 1 si    -2.7322195229558   d                   0              30               3
#> 2 no    -0.09979058844189… e                   0              30               3
#> 3 si    0.976031734922396  f                   0              30               3
#> 4 no    0.413868915451097  d                   0              30               3
#> 5 si    0.912322161610113  e                   0              30               3
#> 6 no    1.98373220068438   f                   0              30               3

answered Feb 21, 2020 at 8:23

pgcudahy

1,60116 silver badges38 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Andrés González Over a year ago

Thank you, but I need to do something like this: df %>% dplyr::mutate(b_dups_hash_ing=ifelse(. > 1,b, 0)) %>% head, but for more than 50 columns (I was wondering to do this by the function mutate_at, but I can't imagine how to do it).

pgcudahy Over a year ago

Is there a predicate function that can be passed to mutate_if to select the columns you want or can you simply pass a numeric vector of positions to mutate_at(vars())?

Andrés González Over a year ago

I can simply pass a numeric vector of positions. I have a vector of names of columns, which are the original ones (c(b,c)), and a vector of column names that I created that should be of the same length, which are the ones that contain the suffix "_dup_hash_ing" (contains('_dups_hash_ing')). This is what I did in the previous step: dplyr::mutate_at(vars(c(b,c)), .funs= list(dups_hash_ing= ~n_distinct(.))). Now that i can detect if there are more than one case per group (in column a), I can replace values with the original ones if there are more than one case per group.

Andrés González Over a year ago

Sorry. I dont speak english very well. I need to replace values in variables that contains "_dup_hash_ing" ("b_dup_hash_ing" and "c_dup_hash_ing") with the original variable ("b" and "c"), for each one, if the values of "_dup_hash_ing" are greater than 1.

Andrés González Over a year ago

I need a function that can do something like this: dplyr::mutate_at(vars(c(b,c)), .funs= funs(dups_hash_ing=ifelse(n_distinct(.)>1, as.character(.),""))) but by giving me the distinct values per row, no by group (a) (group_by(a))

Collectives™ on Stack Overflow

Conditionally replace values of multiple columns, from values of other multiple columns

1 Answer 1

5 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

5 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related