Summarizing using function requiring multiple parameters in R

Question

I'm trying to get the area under the curve of some data for each run of a set of simulation runs. My data is of the form:

run    year    data1    data2    data3
---    ----    -----    -----    -----
1      2001    2.3      45.6     30.2
1      2002    2.4      35.4     23.4
1      2003    2.6      45.6     23.6
2      2001    2.3      45.6     30.2
2      2002    2.4      35.4     23.4
2      2003    2.6      45.6     23.6
3      2001 ... and so on

So, I'd like to get the area under the curve for each data trace for run 1, run 2, ... where the x axis is always the year column and the y axis is each data column. So, as output I want something like:

run    Data1_auc    Data2_auc    Data3_auc
---    ---------    ---------    ---------
1      4.5          6.7           27.5
2      3.4          6.8           35.4
3      4.5          7.8            45.6

(Theses are not actual areas for the data above)

I want to use the pracma package 'trapz' function to compute the area which takes x and y values: trapz(x, y) where x=year in my case and y=Data column.

I've tried

dataCols <- colnames(myData %>% select(-c("run","year"))
myData <- group_by(run) %>% summarize_at(vars(dataCols), list(auc = trapz(year,.)))

but I can't get it to work without error. I've tried different variations on this, but can't seem it get it right.

Is this possible? If so, how do I do it?

Can you post dput(df %>% group_by(run) %>% slice(1:3)) so that we have a reproducible dataset like yours to try ? — Matias Andina
– Matias Andina, Commented Nov 12, 2019 at 14:33
Also, I would try df %>% group_by(run) %>% summarise_at(vars(starts_with("data")), custom_function). You might need some workaround to feed the year into custom_function — Matias Andina
– Matias Andina, Commented Nov 12, 2019 at 14:37

Iaroslav Domin · Accepted Answer · 2019-11-12 14:40:23Z

library(dplyr)
library(pracma)

set.seed(1)
df <- tibble(
  run   = rep(1:3, each = 3),
  year  = rep(2001:2003, 3),
  data1 = runif(9, 2, 3),
  data2 = runif(9, 30, 50),
  data3 = runif(9, 20, 40)
)
df
#> # A tibble: 9 x 5
#>     run  year data1 data2 data3
#>   <int> <int> <dbl> <dbl> <dbl>
#> 1     1  2001  2.27  31.2  27.6
#> 2     1  2002  2.37  34.1  35.5
#> 3     1  2003  2.57  33.5  38.7
#> 4     2  2001  2.91  43.7  24.2
#> 5     2  2002  2.20  37.7  33.0
#> 6     2  2003  2.90  45.4  22.5
#> 7     3  2001  2.94  40.0  25.3
#> 8     3  2002  2.66  44.4  27.7
#> 9     3  2003  2.63  49.8  20.3

df %>% 
  group_by(run) %>% 
  summarise_at(vars(starts_with("data")), list(auc = ~trapz(year, .)))
#> # A tibble: 3 x 4
#>     run data1_auc data2_auc data3_auc
#>   <int>     <dbl>     <dbl>     <dbl>
#> 1     1      4.79      66.5      68.7
#> 2     2      5.10      82.3      56.4
#> 3     3      5.45      89.2      50.5

Collectives™ on Stack Overflow

Summarizing using function requiring multiple parameters in R

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related