0

I want to add a column to each of my data frames in my list table after I do this code :

#list of my dataframes
df <- list(df1,df2,df3,df4)

#compute stats
stats <- function(d) do.call(rbind, lapply(split(d, d[,2]), function(x) data.frame(Nb= length(x$Year), Mean=mean(x$A), SD=sd(x$A)  )))

#Apply to list of dataframes
table <- lapply(df, stats)

This column which I call Source for example, include the names of my dataframes along with Nb, Mean and SD variables. So the variable Source should contain df1,df1,df1... for my table[1], and so on.

Is there anyway I can add it in my code above?

1 Answer 1

2

Here's a different way of doing things:

First, let's start with some reproducible data:

set.seed(1)
n = 10
dat <- list(data.frame(a=rnorm(n), b=sample(1:3,n,TRUE)),
            data.frame(a=rnorm(n), b=sample(1:3,n,TRUE)),
            data.frame(a=rnorm(n), b=sample(1:3,n,TRUE)),
            data.frame(a=rnorm(n), b=sample(1:3,n,TRUE)))

Then, you want a function that adds columns to a data.frame. The obvious candidate is within. The particular things you want to calculate are constant values for each observation within a particular category. To do that, use ave for each of the columns you want to add. Here's your new function:

stat <- function(d){
    within(d, {
        Nb = ave(a, b, FUN=length)
        Mean = ave(a, b, FUN=mean)
        SD = ave(a, b, FUN=sd)
    })        
}

Then just lapply it to your list of data.frames:

lapply(dat, stat)

As you can see, columns are added as appropriate:

> str(lapply(dat, stat))
List of 4
 $ :'data.frame':       10 obs. of  5 variables:
  ..$ a   : num [1:10] -0.626 0.184 -0.836 1.595 0.33 ...
  ..$ b   : int [1:10] 3 1 2 1 1 2 1 2 3 2
  ..$ SD  : num [1:10] 0.85 0.643 0.738 0.643 0.643 ...
  ..$ Mean: num [1:10] -0.0253 0.649 -0.3058 0.649 0.649 ...
  ..$ Nb  : num [1:10] 2 4 4 4 4 4 4 4 2 4
 $ :'data.frame':       10 obs. of  5 variables:
  ..$ a   : num [1:10] -0.0449 -0.0162 0.9438 0.8212 0.5939 ...
  ..$ b   : int [1:10] 2 3 2 1 1 1 1 2 2 2
  ..$ SD  : num [1:10] 1.141 NA 1.141 0.136 0.136 ...
  ..$ Mean: num [1:10] -0.0792 -0.0162 -0.0792 0.7791 0.7791 ...
  ..$ Nb  : num [1:10] 5 1 5 4 4 4 4 5 5 5
 $ :'data.frame':       10 obs. of  5 variables:
  ..$ a   : num [1:10] 1.3587 -0.1028 0.3877 -0.0538 -1.3771 ...
  ..$ b   : int [1:10] 2 3 2 1 3 1 3 1 1 1
  ..$ SD  : num [1:10] 0.687 0.668 0.687 0.635 0.668 ...
  ..$ Mean: num [1:10] 0.873 -0.625 0.873 0.267 -0.625 ...
  ..$ Nb  : num [1:10] 2 3 2 5 3 5 3 5 5 5
 $ :'data.frame':       10 obs. of  5 variables:
  ..$ a   : num [1:10] -0.707 0.365 0.769 -0.112 0.881 ...
  ..$ b   : int [1:10] 3 3 2 2 1 1 3 1 2 2
  ..$ SD  : num [1:10] 0.593 0.593 1.111 1.111 0.297 ...
  ..$ Mean: num [1:10] -0.318 -0.318 0.24 0.24 0.54 ...
  ..$ Nb  : num [1:10] 3 3 4 4 3 3 3 3 4 4
Sign up to request clarification or add additional context in comments.

1 Comment

Thats a clever and easy way to do it! Actually the results I got are ok, I just wanted to add another column to my final list! All i want to do, is add another column that contains the name of which data frame are the statistics calculated from. Can you please show me how to do that from your example?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.