Combine and aggregate multiple data.frames

Question

I have a collection of .csv files each consisting of the same number of rows and columns. Each file contains observations (column 'value') of some test subjects characterised by A, B, C and takes the form similar to the following:

A B C value
1 1 1 0.5
1 1 2 0.6
1 2 1 0.1
1 2 2 0.2
. . . .

Suppose each file is read into a separate data frame. What would be the most efficient way to combine these data frames into a single data frame in which 'value' column contains means, or generally speaking, results of some function call over all 'value' rows for a given test subject. Columns A, B and C are constant across all files and can be viewed as keys for these observations.

Thank you for your help.

Thomas · Accepted Answer · 2014-03-03 10:42:07Z

2

This should be pretty easy, assuming that the files are all ordered in the same way:

dflist <- lapply(dir(pattern='csv'), read.csv)
# row means:
rowMeans(do.call('cbind', lapply(dflist, `[`, 'value')))
# other function `myfun` applied to each row:
apply(do.call('cbind', lapply(dflist, `[`, 'value')), 1, myfun)

answered Mar 3, 2014 at 10:42

Thomas

44.7k12 gold badges115 silver badges144 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Retired Data Munger · Accepted Answer · 2014-03-03 13:09:22Z

Here is another solution in the case where the keys might be in any order, or maybe missing:

n <- 10  # of csv files to create
obs <- 10  # of observations
# create test files
for (i in 1:n){
    df <- data.frame(A = sample(1:3, obs, TRUE)
                , B = sample(1:3, obs, TRUE)
                , C = sample(1:3, obs, TRUE)
                , value = runif(obs)
                )
    write.csv(df, file = tempfile(fileext = '.csv'), row.names = FALSE)
}


# read in the data
input <- lapply(list.files(tempdir(), "*.csv", full.names = TRUE)
    , function(file) read.csv(file)
    )

# put dataframe together and the compute the mean for each unique combination
# of A, B & C assuming that they could be in any order.
input <- do.call(rbind, input)
result <- lapply(split(input, list(input$A, input$B, input$C), drop = TRUE)
    , function(sect){
        sect$value[1L] <- mean(sect$value)
        sect[1L, ]
    }
)

# create output DF
result <- do.call(rbind, result)
result

Collectives™ on Stack Overflow

Combine and aggregate multiple data.frames

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related