1

I have a collection of .csv files each consisting of the same number of rows and columns. Each file contains observations (column 'value') of some test subjects characterised by A, B, C and takes the form similar to the following:

A B C value
1 1 1 0.5
1 1 2 0.6
1 2 1 0.1
1 2 2 0.2
. . . .

Suppose each file is read into a separate data frame. What would be the most efficient way to combine these data frames into a single data frame in which 'value' column contains means, or generally speaking, results of some function call over all 'value' rows for a given test subject. Columns A, B and C are constant across all files and can be viewed as keys for these observations.

Thank you for your help.

2 Answers 2

2

This should be pretty easy, assuming that the files are all ordered in the same way:

dflist <- lapply(dir(pattern='csv'), read.csv)
# row means:
rowMeans(do.call('cbind', lapply(dflist, `[`, 'value')))
# other function `myfun` applied to each row:
apply(do.call('cbind', lapply(dflist, `[`, 'value')), 1, myfun)
Sign up to request clarification or add additional context in comments.

Comments

0

Here is another solution in the case where the keys might be in any order, or maybe missing:

n <- 10  # of csv files to create
obs <- 10  # of observations
# create test files
for (i in 1:n){
    df <- data.frame(A = sample(1:3, obs, TRUE)
                , B = sample(1:3, obs, TRUE)
                , C = sample(1:3, obs, TRUE)
                , value = runif(obs)
                )
    write.csv(df, file = tempfile(fileext = '.csv'), row.names = FALSE)
}


# read in the data
input <- lapply(list.files(tempdir(), "*.csv", full.names = TRUE)
    , function(file) read.csv(file)
    )

# put dataframe together and the compute the mean for each unique combination
# of A, B & C assuming that they could be in any order.
input <- do.call(rbind, input)
result <- lapply(split(input, list(input$A, input$B, input$C), drop = TRUE)
    , function(sect){
        sect$value[1L] <- mean(sect$value)
        sect[1L, ]
    }
)

# create output DF
result <- do.call(rbind, result)
result

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.