2

I have multiple data files formatted like so:

Condition    Score  Reqresponse 
   Z          1         b   
   Y          0         a   

I want to read in multiple data files, get a mean score for each condition/reqresponse combo then tabulate that mean into a master table. I want each the means for each data file to populate a row in the master table (or list, whatever).

Here's what I've attempted

#loop reads data from source example with only 2 files named 1 and 2
for(i in 1:2)
{
n= paste(i,".txt", sep="")
data <- read.table(toString(n), header = TRUE, sep = "\t")

So far so good right? After this I get lost.

Score <- ave(x = data$Score, data$Condition, data$Reqresponse, FUN = mean)
table(Score)
}

This is all I've come up with. I don't know which cells in the table belong to which Condition x Reqresponse combo, or how to create a new row and then feed them into a master table.

By the way, if this is just a silly way to approach what I'm doing feel free to point that out >)

1
  • 1
    The toString would be quite unnecessary. paste returns a character value. Commented Mar 11, 2013 at 6:19

2 Answers 2

3

This should work, although it could be optimized quite a bit:

all_data<-data.frame() #make empty data.frame (we don't know the size)
for(i in 1:2){ #go through all files    
  #add rows to the data frame
  all_data <- rbind(all_data,read.table(paste(i,".txt", sep=""), 
              header = TRUE, sep = "\t"))
}
#use tapply to compute mean
Score<-tapply(all_data$Score,list(all_data$Condition,all_data$Reqresponse),mean)

EDIT: Better solution in terms of performance could be achieved by not making the master data frame at all (although I'm not sure about the efficiency of xtabs vs tapply):

#read the first file
data <- read.table(paste(1,".txt", sep=""),header = TRUE, sep = "\t"))
#number of 1's, formula is a equal to Score==1~Condition+Reqresponse
score1<-xtabs(xtabs(Score~.,data=data) 
#number of 0's, formula is a equal to Score==0~Condition+Reqresponse
score0<-xtabs(!Score~.,data=data)
for(i in 2:n){ #go through the rest of the files  

  data <- read.table(paste(i,".txt", sep=""),header = TRUE, sep = "\t"))

  #sum the number of combinations in file i.txt to previous values
  score1<-score1+xtabs(xtabs(Score~.,data=data) 
  score0<-score0+xtabs(!Score~.,data=data)  
}
#Compute the means   
Score<-score1/(score0+score1)
Sign up to request clarification or add additional context in comments.

3 Comments

+1, although for reading the data you could use apply style loops, these do not suffer from the issues of growing an object sequentially. See my answer.
I'm not growing anything sequentially in my second version as I overwrite the previous data? I agree that the first version is quite inefficient, not only because all_data is growing sequentally, but also that we are making one possibly huge data frame (which is circumvented in my second version).
Yes you are right, then my answer is only in regard to your first solution of reading all files into memory.
3

The answer of @Hemmo involves growing an object sequentially. If the amount of files is large this can become really slow. A more R style approach is not to use the for loop, but to first create a vector of files, and then loop over them using an apply style loop. I'll use an apply loop from the plyr pacakge as this makes live a little easier:

library(plyr)
file_list = sprintf("%s.txt", 1:2)
all_data = ldply(file_list, read.table, header = TRUE, sep = "\t")

After that you can use another plyr function to process the data:

ddply(all_data, .(Condition, Reqresponse), summarise, mn = mean(Score))

You could also use base R functions:

all_data = do.call("rbind", lapply(file_list, read.table, header = TRUE, sep = "\t"))
# Here I copy the tapply call of @Hemmo
Score<-tapply(all_data$Score,list(all_data$Condition,all_data$Reqresponse),mean)

2 Comments

Hi Paul, is there any way I can do stuff like cut certain rows of data, exclude outliers etc before I calculate my means in this R-ish way? I like this much better but not sure how to do that without my loop. Also I want to do ANOVAs on this data (repeated measures, means of different columns) after. What would be the most effective way to do that? Make a df variable as I go?
I think it is best to create a new question where you refer to this one, and explain your additional questions.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.