1

Pardon me if this questions has been answered before but I searched and couldn't find one. I have a list containing multiple lists containing two dataframes. I want to apply t.test between first row of dataframe 1 and first row of dataframe 2 and so on.

I tried this:

list1 <- list(set1 = data.frame(rnorm(100), rexp(100)), set2 = data.frame(rnorm(100, mean = 5, sd = 3), rexp(100, rate = 4))) 

list2 <- list(set1 = data.frame(rnorm(100), rexp(100)), set2 = data.frame(rnorm(100, mean = 6, sd = 4), rexp(100, rate = 2)))

mylist <- list(list1, list2)

ttest<-function(list){
        df1 <- list$set1
        df2 <- list$set2
        testresults<-rep(NA,nrow(df1))
        for (j in seq(nrow(df1))){ 

               testresults[j] <- t.test(df1[j,], df2[j,])$p.value                
        }
        return(as.matrix(testresults))}
lapply(mylist,ttest)

This works fine but takes a lot of time because of this for loop and since the actual data is much larger. I want to replace the for loop with an apply function(if possible). Please suggest.

1
  • The bottleneck is the actual t.test function, which you can verify by profiling your code. Your memory-allocation + for loop approach is actually correct here, and the fastest way to do it. Commented Jul 3, 2014 at 16:39

2 Answers 2

3

You basically want to use lapply with a function taking more than one arguments, which is Map. So you can replace ttest in your code with

ttest2 <- function(list) {
    df1 <- list$set1
    df2 <- list$set2
    l1 <- unlist(apply(df1, 1, list), recursive = FALSE)
    l2 <- unlist(apply(df2, 1, list), recursive = FALSE)
    testresults <- unlist(Map(function(x,y) t.test(x,y)$p.value, x=l1, y=l2))
    return(as.matrix(testresults))
}

This seems to be faster. I extended your data frames to have 10000 rows (it runs quite fast with 100 and can't see the difference much) and got

system.time(lapply(mylist,ttest))
#   user  system elapsed 
# 12.736   0.000  12.760 
system.time(lapply(mylist,ttest2))
#   user  system elapsed 
#  3.825   0.000   3.833 
Sign up to request clarification or add additional context in comments.

Comments

0

Try:

res1 <- sapply(mylist, function(x) {
                   x1 <- do.call(`cbind`,x)
                   apply(x1, 1, function(y) t.test(y[1:2], y[3:4])$p.value)
                })

Using your function

 res2 <- sapply(mylist, ttest)
 identical(res1, res2)
#[1] TRUE

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.