3

I have a dataset structured as

structure(list(ID = structure(1:26, .Label = c("a", "b", "c", 
"d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", 
"q", "r", "s", "t", "u", "v", "w", "x", "y", "z"), class = "factor"), 
    start.date = structure(c(16583, 16567, 16475, 16507, 16787, 
    16733, 16614, 16458, 16713, 16689, 16497, 16741, 16527, 16703, 
    16555, 16612, 16727, 16656, 16461, 16758, 16536, 16482, 16604, 
    16438, 16489, 16470), class = "Date"), end.date = structure(c(17162, 
    16973, 16840, 16915, 16870, 17076, 16815, 16901, 16832, 17108, 
    16866, 16963, 17086, 16859, 16947, 17060, 17063, 16984, 16882, 
    17087, 16865, 16811, 16871, 16987, 16938, 16910), class = "Date"), 
    B = c(1.01037117881286, -0.468494103910475, 2.1029305771466, 
    0.541203301346257, 0.48316300974706, -0.15634066165743, 0.0748809270194454, 
    0.342283686647221, 1.36029900729214, -0.265980006971409, 
    -0.200070929944069, -0.778013502221203, -1.95834433790751, 
    0.981791345936073, 0.806205039682571, 0.211808478113909, 
    -0.718520854351539, -1.41251704545666, 0.132766895582887, 
    -1.17600286793503, -1.69832803867181, 0.642400945149099, 
    -1.64248957354041, 1.80252672879676, 0.318451979178807, 0.59025890995253
    ), C = c(6L, 12L, 14L, 9L, 4L, 17L, 14L, 7L, 11L, 5L, 17L, 
    10L, 11L, 5L, 17L, 10L, 9L, 6L, 4L, 5L, 12L, 9L, 9L, 8L, 
    14L, 10L), D = structure(c(26L, 25L, 24L, 23L, 22L, 21L, 
    20L, 19L, 18L, 17L, 16L, 15L, 14L, 13L, 12L, 11L, 10L, 9L, 
    8L, 7L, 6L, 5L, 4L, 3L, 2L, 1L), .Label = c("A", "B", "C", 
    "D", "E", "F", "G", "H", "I", "J", "K", "L", "M", "N", "O", 
    "P", "Q", "R", "S", "T", "U", "V", "W", "X", "Y", "Z"), class = "factor"), 
    E = c(1L, 0L, 0L, 1L, 2L, 0L, 0L, 0L, 1L, 2L, 0L, 1L, 0L, 
    0L, 1L, 1L, 0L, 1L, 1L, 1L, 0L, 1L, 0L, 0L, 1L, 1L)), .Names = c("ID", 
"start.date", "end.date", "B", "C", "D", "E"), row.names = c(NA, 
-26L), class = "data.frame")

Note that this is a sample dataset that is much much smaller than the real dataset. I have written a function

restandard    <- function(x, CID.start, CID.end){
                 time_length <- difftime(CID.end, CID.start, units = "days")/365.25
                 replace_x   <- x/time_length
}

I would like to apply restandard to my dataset, but the numeric variables only. This means essentially I need to use mutate_if.

I have tried using to no avail:

df %>% mutate_if(~is.numeric(.x), restandard(.x, CID.start = start.date, CID.end = end.date))

Any way to do this? My actual problem is that there are 25 columns I want to apply this to. Thanks.

2
  • 2
    Try mutate_if(df,is.numeric,funs(restandard(., CID.start = start.date, CID.end = end.date))) but there is a problem in your restandard function. Commented Aug 23, 2018 at 13:55
  • Note the ~ is a shortcut to writing anonymous functions within the mutate_if call so if you were using it it would go in front of the function. But if you are using a named function the easiest thing is to give just the function name and supply additional arguments to the ... as shown in Ajar's answer Commented Aug 23, 2018 at 14:52

2 Answers 2

4

You were pretty close; your restandard function also had a bug. From your sample data:

restandard    <- function(x, CID.start, CID.end){
  time_length <- difftime(CID.end, CID.start, units = "days") / 365.25
  x / as.numeric(time_length)
}

df %>% mutate_if(is.numeric, restandard, CID.start = .$start.date, CID.end = .$end.date)

Which gives this:

   ID start.date   end.date          B         C D         E
1   a 2015-05-28 2016-12-27  0.6373715  3.784974 Z 0.6308290
2   b 2015-05-12 2016-06-21 -0.4214716 10.795567 Y 0.0000000
3   c 2015-02-09 2016-02-09  2.1043709 14.009589 X 0.0000000
4   d 2015-03-13 2016-04-24  0.4844963  8.056985 W 0.8952206
5   e 2015-12-18 2016-03-10  2.1262083 17.602410 V 8.8012048
6   f 2015-10-25 2016-10-02 -0.1664823 18.102770 U 0.0000000
Sign up to request clarification or add additional context in comments.

3 Comments

They aren't doing any subsetting, but that's fair.
What do you mean by your comment?
There had been another comment here that prompted me to use .$ instead of df$, so I changed that, upvoted it, & replied. Not sure why they deleted the comment.
2

It is easier to create a function flow with quosure and evaluate (!!) it

f1    <- function(dat,  CID.start, CID.end){
                  CID.start <- enquo(CID.start)
                  CID.end <- enquo(CID.end)
                  dat %>% 
                       mutate_if(is.numeric, 
     funs(./as.numeric(difftime(!! CID.end, !! CID.start, units = 'days')/365.25)))
                }

f1(df, start.end, end.date)

2 Comments

What is quosure and are there any tutorials on it? I have never heard of it.
@akash87 You can check the vignettes jhere

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.