I have 600 tab-delimited .txt files that look like this:
barcode gene.symbol value
1 TCGA-61-2610-02A-01R-1141-07 15E1.2 -0.78175
2 TCGA-61-2610-02A-01R-1141-07 2'-PDE -1.0155
3 TCGA-61-2610-02A-01R-1141-07 7A5 0.029
4 TCGA-61-2610-02A-01R-1141-07 A1BG 0.96575
5 TCGA-61-2610-02A-01R-1141-07 A2BP1 -0.301
6 TCGA-61-2610-02A-01R-1141-07 A2M -2.21575
I want to put together all the 600 files in one data frame such that gene.symbol will be the row names and values will be combined with first 12 characters of the barcode being the column name. Searching through SO I think I've got a loop that does this with one caveat. Here's what I have (I'm still learning R so the code might look very crude):
n = 600
df <- read.delim(file=paste("agilent1.txt")
df.tmp <- data.frame()
colnames(df) = c("barcode", "gene.symbol", levels(df$barcode))
df = df[2 :3]
once I have df with the first file's values, the loop starts adding the other files' value columns (the files are named as agilent1.txt, agilent2.txt etc):
for (i in 2:n) {
df.tmp <- read.delim(file=paste("agilent", i, ".txt", sep="")
a <- as.character(levels(df.tmp$barcode))
a <- substr(a, 1, 12)
df <- cbind(df, a = df.tmp$value)
}
everything work BUT in cbind command, a = df.tmp$value makes the column name a (which makes sense) but I want the value of a to be the column name.
gene.symbol TCGA-61-2614 a a a a
1 15E1.2 0.80475 -0.47375 -0.26825 -0.13425 -0.78175
2 2'-PDE -0.1348125 -0.1565625 0.19475 -0.3819375 -1.0155
3 7A5 2.2735 2.4405 0.902 1.248 0.029
4 A1BG 0.817166666666667 -0.0471666666666667 -0.1005 -0.283333333333333 0.96575
5 A2BP1 -0.811333333333333 -1.02566666666667 -0.494833333333333 -0.948 -0.301
6 A2M -0.719 -1.00575 -1.07275 0.517 -2.21575
It sounds so easy in my mind but I can't seem to find the answer. Any help would be greatly appreciated.
Cheers,
Ahmet