I'm trying to get the area under the curve of some data for each run of a set of simulation runs. My data is of the form:
run year data1 data2 data3
--- ---- ----- ----- -----
1 2001 2.3 45.6 30.2
1 2002 2.4 35.4 23.4
1 2003 2.6 45.6 23.6
2 2001 2.3 45.6 30.2
2 2002 2.4 35.4 23.4
2 2003 2.6 45.6 23.6
3 2001 ... and so on
So, I'd like to get the area under the curve for each data trace for run 1, run 2, ... where the x axis is always the year column and the y axis is each data column. So, as output I want something like:
run Data1_auc Data2_auc Data3_auc
--- --------- --------- ---------
1 4.5 6.7 27.5
2 3.4 6.8 35.4
3 4.5 7.8 45.6
(Theses are not actual areas for the data above)
I want to use the pracma package 'trapz' function to compute the area which takes x and y values: trapz(x, y) where x=year in my case and y=Data column.
I've tried
dataCols <- colnames(myData %>% select(-c("run","year"))
myData <- group_by(run) %>% summarize_at(vars(dataCols), list(auc = trapz(year,.)))
but I can't get it to work without error. I've tried different variations on this, but can't seem it get it right.
Is this possible? If so, how do I do it?
dput(df %>% group_by(run) %>% slice(1:3))so that we have a reproducible dataset like yours to try ?df %>% group_by(run) %>% summarise_at(vars(starts_with("data")), custom_function). You might need some workaround to feed theyearintocustom_function