Using a custom function in dplyr process

Question

I have a dataset structured as

structure(list(ID = structure(1:26, .Label = c("a", "b", "c", 
"d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", 
"q", "r", "s", "t", "u", "v", "w", "x", "y", "z"), class = "factor"), 
    start.date = structure(c(16583, 16567, 16475, 16507, 16787, 
    16733, 16614, 16458, 16713, 16689, 16497, 16741, 16527, 16703, 
    16555, 16612, 16727, 16656, 16461, 16758, 16536, 16482, 16604, 
    16438, 16489, 16470), class = "Date"), end.date = structure(c(17162, 
    16973, 16840, 16915, 16870, 17076, 16815, 16901, 16832, 17108, 
    16866, 16963, 17086, 16859, 16947, 17060, 17063, 16984, 16882, 
    17087, 16865, 16811, 16871, 16987, 16938, 16910), class = "Date"), 
    B = c(1.01037117881286, -0.468494103910475, 2.1029305771466, 
    0.541203301346257, 0.48316300974706, -0.15634066165743, 0.0748809270194454, 
    0.342283686647221, 1.36029900729214, -0.265980006971409, 
    -0.200070929944069, -0.778013502221203, -1.95834433790751, 
    0.981791345936073, 0.806205039682571, 0.211808478113909, 
    -0.718520854351539, -1.41251704545666, 0.132766895582887, 
    -1.17600286793503, -1.69832803867181, 0.642400945149099, 
    -1.64248957354041, 1.80252672879676, 0.318451979178807, 0.59025890995253
    ), C = c(6L, 12L, 14L, 9L, 4L, 17L, 14L, 7L, 11L, 5L, 17L, 
    10L, 11L, 5L, 17L, 10L, 9L, 6L, 4L, 5L, 12L, 9L, 9L, 8L, 
    14L, 10L), D = structure(c(26L, 25L, 24L, 23L, 22L, 21L, 
    20L, 19L, 18L, 17L, 16L, 15L, 14L, 13L, 12L, 11L, 10L, 9L, 
    8L, 7L, 6L, 5L, 4L, 3L, 2L, 1L), .Label = c("A", "B", "C", 
    "D", "E", "F", "G", "H", "I", "J", "K", "L", "M", "N", "O", 
    "P", "Q", "R", "S", "T", "U", "V", "W", "X", "Y", "Z"), class = "factor"), 
    E = c(1L, 0L, 0L, 1L, 2L, 0L, 0L, 0L, 1L, 2L, 0L, 1L, 0L, 
    0L, 1L, 1L, 0L, 1L, 1L, 1L, 0L, 1L, 0L, 0L, 1L, 1L)), .Names = c("ID", 
"start.date", "end.date", "B", "C", "D", "E"), row.names = c(NA, 
-26L), class = "data.frame")

Note that this is a sample dataset that is much much smaller than the real dataset. I have written a function

restandard    <- function(x, CID.start, CID.end){
                 time_length <- difftime(CID.end, CID.start, units = "days")/365.25
                 replace_x   <- x/time_length
}

I would like to apply restandard to my dataset, but the numeric variables only. This means essentially I need to use mutate_if.

I have tried using to no avail:

df %>% mutate_if(~is.numeric(.x), restandard(.x, CID.start = start.date, CID.end = end.date))

Any way to do this? My actual problem is that there are 25 columns I want to apply this to. Thanks.

Try mutate_if(df,is.numeric,funs(restandard(., CID.start = start.date, CID.end = end.date))) but there is a problem in your restandard function. — Nicolas2
– Nicolas2, Commented Aug 23, 2018 at 13:55
Note the ~ is a shortcut to writing anonymous functions within the mutate_if call so if you were using it it would go in front of the function. But if you are using a named function the easiest thing is to give just the function name and supply additional arguments to the ... as shown in Ajar's answer — see24
– see24, Commented Aug 23, 2018 at 14:52

Ajar · Accepted Answer · 2018-08-23 14:03:09Z

4

You were pretty close; your restandard function also had a bug. From your sample data:

restandard    <- function(x, CID.start, CID.end){
  time_length <- difftime(CID.end, CID.start, units = "days") / 365.25
  x / as.numeric(time_length)
}

df %>% mutate_if(is.numeric, restandard, CID.start = .$start.date, CID.end = .$end.date)

Which gives this:

   ID start.date   end.date          B         C D         E
1   a 2015-05-28 2016-12-27  0.6373715  3.784974 Z 0.6308290
2   b 2015-05-12 2016-06-21 -0.4214716 10.795567 Y 0.0000000
3   c 2015-02-09 2016-02-09  2.1043709 14.009589 X 0.0000000
4   d 2015-03-13 2016-04-24  0.4844963  8.056985 W 0.8952206
5   e 2015-12-18 2016-03-10  2.1262083 17.602410 V 8.8012048
6   f 2015-10-25 2016-10-02 -0.1664823 18.102770 U 0.0000000

edited Aug 23, 2018 at 14:03

answered Aug 23, 2018 at 13:57

Ajar

1,8262 gold badges16 silver badges24 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Ajar Over a year ago

They aren't doing any subsetting, but that's fair.

akash87 Over a year ago

What do you mean by your comment?

Ajar Over a year ago

There had been another comment here that prompted me to use .$ instead of df$, so I changed that, upvoted it, & replied. Not sure why they deleted the comment.

akrun · Accepted Answer · 2018-08-23 13:54:11Z

2

It is easier to create a function flow with quosure and evaluate (!!) it

f1    <- function(dat,  CID.start, CID.end){
                  CID.start <- enquo(CID.start)
                  CID.end <- enquo(CID.end)
                  dat %>% 
                       mutate_if(is.numeric, 
     funs(./as.numeric(difftime(!! CID.end, !! CID.start, units = 'days')/365.25)))
                }

f1(df, start.end, end.date)

answered Aug 23, 2018 at 13:54

akrun

891k38 gold badges590 silver badges700 bronze badges

2 Comments

akash87 Over a year ago

What is quosure and are there any tutorials on it? I have never heard of it.

akrun Over a year ago

@akash87 You can check the vignettes jhere

Collectives™ on Stack Overflow

Using a custom function in dplyr process

2 Answers 2

3 Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related