1

I have multiple data frames (moving temperature of different duration at 130 observation points), and want to generate monthly average for all the data by applying the below code to each data frame - then put the outcome into one data frame. I have been trying to do this with for-loop, but not getting anywhere. I'm relatively new to R and really appreciate if someone could help me get through this.

Here is the glimpse of a data frame:

head(maxT2016[,1:5])

      X       X0       X1       X2       X3
1 20160101 26.08987 26.08987 26.08987 26.08987
2 20160102 25.58242 25.58242 25.58242 25.58242
3 20160103 25.44290 25.44290 25.44290 25.44290
4 20160104 26.88043 26.88043 26.88043 26.88043
5 20160105 26.60278 26.60278 26.60278 26.60278
6 20160106 24.87676 24.87676 24.87676 24.87676

str(maxT2016)
'data.frame':   274 obs. of  132 variables:
$ X   : int  20160101 20160102 20160103 20160104 20160105 20160106 20160107 20160108 20160109 20160110 ...

$ X0  : num  26.1 25.6 25.4 26.9 26.6 ...
$ X1  : num  26.1 25.6 25.4 26.9 26.6 ...
$ X2  : num  26.1 25.6 25.4 26.9 26.6 ...
$ X3  : num  26.1 25.6 25.4 26.9 26.6 ...

Here is the code that I use for individual data frame:

library(dplyr)
library(lubridate)
library(tidyverse)

maxT10$X <- as.Date(as.character(maxTsma10$X), format="%Y%m%d") 

monthlyAvr <- maxT10 %>%
  group_by(month=floor_date(date, "month")) %>%
  summarise(across(X0:X130, mean, na.rm=TRUE)) %>%
  slice_tail(n=6) %>%
  select(-month)

monthlyAvr2 <- as.data.frame(t(montlyAvr))
colnames(monthlyAvr2) <- c("meanT_Apr", "meanT_May", "meanT_Jun", "meanT_Jul", "meanT_Aug", 
"meanT_Sep")

Essentially, I want to put all the all the data frames into a list and run the function through all the data frame, then sort these outputs into one data frame. I came across with lapply function as an alternative, but somewhat felt more comfortable with for-loop.

d = list(maxT10, maxT20, maxT30, maxT60 ... ...)

for (i in 1:lengh(d)){

}

MonthlyAvrT <- cbind(maxT10, maxT20, maxT30, maxT60... ... ) 

2 Answers 2

2

Basil. Welcome to StackOverflow.

I was wary of lapply when I first stated using R, but you should stick with it. It's almost always more efficient than using a for loop. In your particular case, you can put your individual data frames in a list and the code you run on each into a function myFunc, say, which takes the data frame you want to process as its argument.

Then you can simply say

allData <- bind_rows(lapply(1:length(dataFrameList), function(x) myFunc(dataFrameList[[x]])))

Incidentally, your column names make me think your data isn't yet tidy. I'd suggest you spend a little time making it so before you do much else. It will save you a huge amount of effort in the long run.

Sign up to request clarification or add additional context in comments.

5 Comments

very good suggestions. In your example, x is the index of the data.frame in the list, @Basil you can also directly use lapply with the list of data.frames and perform your function on the data.frame: lapply(dataFrameList, function(x) myFunc(x))
Thank you @Limey and @starja! Tidy data article is already helping me realize that I have to go back several steps where I started processing my original data. After that I'll explore 'lapply'.
You're welcome, Basil. It will only be a small step back, but I promise you it will be worthwhile in the long run. IMHo, the vast majority of "I have an difficult data processing step" questions that are posted on SO are due to having chosen an inappropriate data format earlier in the workflow.
@Limey, you were right. I went all the way back to the original data frame I started off with. It was indeed messy. I followed the principles of tidy data. Then all the succeeding steps I wanted to take became suddenly easy - even managed without lapply. Many thanks!
I'm very happy to have helped. :)
0

The logic in pseudo-code would be:

for each data.frame in list
    apply a function
    save the results

Applying my_function on each data.frame of the data_set list :

my_function <- function(my_df) {

  my_df <- as.data.frame(my_df)
  out <- apply(my_df, 2, mean)  # compute mean on dimension 2 (columns)
  return(out)

}

# 100 data.frames
data_set <- replicate(100, data.frame(X=runif(6, 20160101, 20160131), X0=rnorm(6, 25)))
> dim(data_set) 
[1]   2 100
results <- apply(data_set, 2, my_function)  # Apply my_function on dimension 2

# Output for first 5 data.frames
> results[, 1:5]                                                                                                                                                                          
           [,1]         [,2]         [,3]         [,4]         [,5]                                                                  

X  2.016012e+07 2.016011e+07 2.016011e+07 2.016012e+07 2.016011e+07                                                                                                                       
X0 2.533888e+01 2.495086e+01 2.523087e+01 2.491822e+01 2.482142e+01

1 Comment

Glad it helps! Don't forget to mark your question as answered if it does answer your question.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.