A set of functions over multiple data frames and merge the outputs in R

Question

I have multiple data frames (moving temperature of different duration at 130 observation points), and want to generate monthly average for all the data by applying the below code to each data frame - then put the outcome into one data frame. I have been trying to do this with for-loop, but not getting anywhere. I'm relatively new to R and really appreciate if someone could help me get through this.

Here is the glimpse of a data frame:

head(maxT2016[,1:5])

      X       X0       X1       X2       X3
1 20160101 26.08987 26.08987 26.08987 26.08987
2 20160102 25.58242 25.58242 25.58242 25.58242
3 20160103 25.44290 25.44290 25.44290 25.44290
4 20160104 26.88043 26.88043 26.88043 26.88043
5 20160105 26.60278 26.60278 26.60278 26.60278
6 20160106 24.87676 24.87676 24.87676 24.87676

str(maxT2016)
'data.frame':   274 obs. of  132 variables:
$ X   : int  20160101 20160102 20160103 20160104 20160105 20160106 20160107 20160108 20160109 20160110 ...

$ X0  : num  26.1 25.6 25.4 26.9 26.6 ...
$ X1  : num  26.1 25.6 25.4 26.9 26.6 ...
$ X2  : num  26.1 25.6 25.4 26.9 26.6 ...
$ X3  : num  26.1 25.6 25.4 26.9 26.6 ...

Here is the code that I use for individual data frame:

library(dplyr)
library(lubridate)
library(tidyverse)

maxT10$X <- as.Date(as.character(maxTsma10$X), format="%Y%m%d") 

monthlyAvr <- maxT10 %>%
  group_by(month=floor_date(date, "month")) %>%
  summarise(across(X0:X130, mean, na.rm=TRUE)) %>%
  slice_tail(n=6) %>%
  select(-month)

monthlyAvr2 <- as.data.frame(t(montlyAvr))
colnames(monthlyAvr2) <- c("meanT_Apr", "meanT_May", "meanT_Jun", "meanT_Jul", "meanT_Aug", 
"meanT_Sep")

Essentially, I want to put all the all the data frames into a list and run the function through all the data frame, then sort these outputs into one data frame. I came across with lapply function as an alternative, but somewhat felt more comfortable with for-loop.

d = list(maxT10, maxT20, maxT30, maxT60 ... ...)

for (i in 1:lengh(d)){

}

MonthlyAvrT <- cbind(maxT10, maxT20, maxT30, maxT60... ... )

Limey · Accepted Answer · 2020-06-07 14:29:24Z

2

Basil. Welcome to StackOverflow.

I was wary of lapply when I first stated using R, but you should stick with it. It's almost always more efficient than using a for loop. In your particular case, you can put your individual data frames in a list and the code you run on each into a function myFunc, say, which takes the data frame you want to process as its argument.

Then you can simply say

allData <- bind_rows(lapply(1:length(dataFrameList), function(x) myFunc(dataFrameList[[x]])))

Incidentally, your column names make me think your data isn't yet tidy. I'd suggest you spend a little time making it so before you do much else. It will save you a huge amount of effort in the long run.

edited Jun 7, 2020 at 14:29

answered Jun 7, 2020 at 13:59

Limey

12.9k2 gold badges17 silver badges42 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

starja Over a year ago

very good suggestions. In your example, x is the index of the data.frame in the list, @Basil you can also directly use lapply with the list of data.frames and perform your function on the data.frame: lapply(dataFrameList, function(x) myFunc(x))

Basil Over a year ago

Thank you @Limey and @starja! Tidy data article is already helping me realize that I have to go back several steps where I started processing my original data. After that I'll explore 'lapply'.

Limey Over a year ago

You're welcome, Basil. It will only be a small step back, but I promise you it will be worthwhile in the long run. IMHo, the vast majority of "I have an difficult data processing step" questions that are posted on SO are due to having chosen an inappropriate data format earlier in the workflow.

Basil Over a year ago

@Limey, you were right. I went all the way back to the original data frame I started off with. It was indeed messy. I followed the principles of tidy data. Then all the succeeding steps I wanted to take became suddenly easy - even managed without lapply. Many thanks!

Limey Over a year ago

I'm very happy to have helped. :)

Trusky · Accepted Answer · 2020-06-07 14:33:34Z

0

The logic in pseudo-code would be:

for each data.frame in list
    apply a function
    save the results

Applying my_function on each data.frame of the data_set list :

my_function <- function(my_df) {

  my_df <- as.data.frame(my_df)
  out <- apply(my_df, 2, mean)  # compute mean on dimension 2 (columns)
  return(out)

}

# 100 data.frames
data_set <- replicate(100, data.frame(X=runif(6, 20160101, 20160131), X0=rnorm(6, 25)))

> dim(data_set) 
[1]   2 100

results <- apply(data_set, 2, my_function)  # Apply my_function on dimension 2

# Output for first 5 data.frames

> results[, 1:5]                                                                                                                                                                          
           [,1]         [,2]         [,3]         [,4]         [,5]                                                                  

X  2.016012e+07 2.016011e+07 2.016011e+07 2.016012e+07 2.016011e+07                                                                                                                       
X0 2.533888e+01 2.495086e+01 2.523087e+01 2.491822e+01 2.482142e+01

answered Jun 7, 2020 at 14:33

Trusky

5032 silver badges14 bronze badges

1 Comment

Trusky Over a year ago

Glad it helps! Don't forget to mark your question as answered if it does answer your question.

Collectives™ on Stack Overflow

A set of functions over multiple data frames and merge the outputs in R

2 Answers 2

5 Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

5 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related