3

I'm trying to get the area under the curve of some data for each run of a set of simulation runs. My data is of the form:

run    year    data1    data2    data3
---    ----    -----    -----    -----
1      2001    2.3      45.6     30.2
1      2002    2.4      35.4     23.4
1      2003    2.6      45.6     23.6
2      2001    2.3      45.6     30.2
2      2002    2.4      35.4     23.4
2      2003    2.6      45.6     23.6
3      2001 ... and so on

So, I'd like to get the area under the curve for each data trace for run 1, run 2, ... where the x axis is always the year column and the y axis is each data column. So, as output I want something like:

run    Data1_auc    Data2_auc    Data3_auc
---    ---------    ---------    ---------
1      4.5          6.7           27.5
2      3.4          6.8           35.4
3      4.5          7.8            45.6

(Theses are not actual areas for the data above)

I want to use the pracma package 'trapz' function to compute the area which takes x and y values: trapz(x, y) where x=year in my case and y=Data column.

I've tried

dataCols <- colnames(myData %>% select(-c("run","year"))
myData <- group_by(run) %>% summarize_at(vars(dataCols), list(auc = trapz(year,.)))

but I can't get it to work without error. I've tried different variations on this, but can't seem it get it right.

Is this possible? If so, how do I do it?

2
  • 1
    Can you post dput(df %>% group_by(run) %>% slice(1:3)) so that we have a reproducible dataset like yours to try ? Commented Nov 12, 2019 at 14:33
  • Also, I would try df %>% group_by(run) %>% summarise_at(vars(starts_with("data")), custom_function). You might need some workaround to feed the year into custom_function Commented Nov 12, 2019 at 14:37

1 Answer 1

5
library(dplyr)
library(pracma)

set.seed(1)
df <- tibble(
  run   = rep(1:3, each = 3),
  year  = rep(2001:2003, 3),
  data1 = runif(9, 2, 3),
  data2 = runif(9, 30, 50),
  data3 = runif(9, 20, 40)
)
df
#> # A tibble: 9 x 5
#>     run  year data1 data2 data3
#>   <int> <int> <dbl> <dbl> <dbl>
#> 1     1  2001  2.27  31.2  27.6
#> 2     1  2002  2.37  34.1  35.5
#> 3     1  2003  2.57  33.5  38.7
#> 4     2  2001  2.91  43.7  24.2
#> 5     2  2002  2.20  37.7  33.0
#> 6     2  2003  2.90  45.4  22.5
#> 7     3  2001  2.94  40.0  25.3
#> 8     3  2002  2.66  44.4  27.7
#> 9     3  2003  2.63  49.8  20.3

df %>% 
  group_by(run) %>% 
  summarise_at(vars(starts_with("data")), list(auc = ~trapz(year, .)))
#> # A tibble: 3 x 4
#>     run data1_auc data2_auc data3_auc
#>   <int>     <dbl>     <dbl>     <dbl>
#> 1     1      4.79      66.5      68.7
#> 2     2      5.10      82.3      56.4
#> 3     3      5.45      89.2      50.5
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.