#let's make some sample data first
names<- c("t1","t2","t3","t4","t5","t1","t2","t3","t4","t5","t1","t2","t3","t4","t5")
metric1_set1 <- c(2.5,3.1,4.5,2.5,12,7.1,8.5,10,10.1,17.8,12.3,11,10,14,1.5)
metric1_set2 <- c(2.1,3.1,4.15,2.5,10,7.1,8.5,10,10.1,17.1,12.3,17.3,8,11,1.5)
metric1_set3 <- c(12.1,13.1,4.15,2.5,10.5,7.1,2.5,10,7.1,11.1,12.3,17.3,8,1.45,1.5)
dataset1 <- data.frame(names,metric1_set1,metric1_set2,metric1_set3)
names<- c("t1","t2","t3","t4","t5","t1","t2","t3","t4","t5","t1","t2","t3","t4","t5")
metric2_set1 <- c(21.5,13.1,4.5,2.5,12,7.1,8.5,10,10.1,17.8,12.3,11,10,14,1.5)
metric2_set2 <- c(12.1,3.1,4.15,2.5,10,7.1,8.5,10,8.1,17.1,12.3,17.3,8,1.1,1.5)
metric2_set3 <- c(2.1,13.1,4.15,2.5,10.5,7.1,21.5,10,7.1,11.1,12.3,12.3,8,1.45,1.5)
dataset2 <- data.frame(names,metric2_set1,metric2_set2,metric2_set3)
Now the issue is to calculate the top quartile for each column of dataset1 and then pull out the corresponding names from dataset2. The idea is to get the correlation between these subsetted values.
quantiles <- apply(dataset1[2:4], 2, quantile, na.rm = TRUE)
Would obtain quartiles but the actual question is how to save names associated with let's say top qunatile of one dataset and drop every other row from both datasets.
Based on what @sconfluentus suggested we can change it to:
topQuartile<-function(x){ #the function
y=quantile(x, na.rm = TRUE )
z=y[3]
return(z)
}
quartile_daatset1<- apply( dataset1[2:4] , 2 , topQuartile )
This perfectly works but I also need something similar to the following:
topquartile_set1 <- subset(dataset1$metric1_set1, subset=(dataset1$metric1_set1 <= quant_daatset1[1]))
I need similar code that works for each column and puts all subsets together in a single final data frame.