I have many dataframes that all have the same variables and structure. I would like to go from individual level data in each input dataframe, and using functions, summarize the input dataframe across all rows creating new variables. I.e., for every input dataframe, I would like to create an output dataframe with one row summarizing the variables named in regularVar_names for every specified age group, with age group being flexibly implemented. The function should estimate the number of rows where the variable is not-NA. Within the code below, I subset the variable X_AGE80 to be between 18-84. Ultimately I need this function to work for different age groups that are subsets of a master dataset of adults. Subsets include 18-20, 21-24, 25-84, 18 only, 19 only, etc. However, I was thinking along the lines of @margusl, and that this is easy to control from outside the function. It would be icing on the cake for the answer to account for age groups in an elegant way.
This is how I tried to implement it.
Data:
input.ds.2018 = data.frame(Var1 = c(1,1,NA,NA,1,2),Var2 = rep(c(1,2),3),V3 = c(NA,rep(2,4),1),
y_4 = c(NA,"y","z","l","m","n"),X_AGE80 = c(17,18,NA,84,21,72))
This is my attempted solution, but apparently . does not supply the input dataframe like I assumed.
calc_unwt_n_regularVar_fn = function(df,VAR){
df %>% filter(!is.na(eval(parse(text = VAR)))) %>% nrow
}
# apply calc_unwt_n_regularVar_fn to age-group 18 to 84 for regular variables called Var1 and Var2
regularVar_names = c("Var1","Var2")
output = input.ds.2018 %>%
filter(X_AGE80 <= 84) %>%
filter(X_AGE80 >= 18) %>%
summarize(across(all_of(regularVar_names), ~ calc_unwt_n_regularVar_fn(.,cur_column()),.names = "unwt_denom_{.col}"))
However it thinks . is equivalent to cur_column(), so it throws an error:
Error in `summarize()`:
i In argument: `across(...)`.
Caused by error in `across()`:
! Can't compute column `unwt_denom_Var1`.
Caused by error in `UseMethod()`:
! no applicable method for 'filter' applied to an object of class "c('double', 'numeric')"
I also tried replacing . with .data to try to pass in the input dataframe as a parameter, but that didnt' work either.
So my questions are: (1) How do I input the dataframe as a parameter to the function, "calc_unwt_n_regularVar_fn"? Or if this is a dumb way to go about it, (2) How should I implement creating new summary variables for each input dataframe and various age groups, where the summary variables are required for each input dataframe/age group combination.
calc_unwt_n_regularVar_fncalculate? I do not see where age group gets specified automatically. Is it always 18 to 84?calc_unwt_n_regularVar_fn, please explain. Your question is currently quite unclear. Also you are missing a second data frame to allow demonstration for i>1, e.g. on a list of data frames (you said you have many). Are they collected in a list?