I am trying to extract regions from a chromosome/position data frame that correspond with a related list of chromosomal positions depicting histone locations. My current "pipeline":
import original dataset containing chromosome and position:
> mapinfo<- read.table()
> colnames(mapinfo)<-c("CHR","M")
> mapinfo$CHR<- paste("chr",mapinfo$CHR),sep="")
> head(mapinfo)
CHR M
1 chrX 24072640
2 chr9 131463936
3 chr14 105176736
4 chr13 115000168
5 chr8 74791285
6 chr19 3676340
import bed file containing histone locations
> bed<-read.table()
> names(bed)<- c("Chr","Start","Stop")
> head(bed)
Chr Start Stop
1 chr4 76806896 76807598
2 chrY 10034763 10036639
3 chr2 133036421 133037716
4 chr21 27227897 27228500
5 chr1 145036931 145041607
6 chr2 91777964 91779762
Generate chromosome specific codes for use in subsetting from mapinfo data frame
> Mcodes<- by(bed,bed$Chr,function(x){paste("M>=",bed$Start,"&M<=",bed$Stop,sep="",collapse="|")})
> Mcodes[chr1]
chr1
"M>=130786932&M<=130787255|M>=133156512&M<=133156894..."
Subset original mapinfo dataset by chromosome:
> subs<- split(mapinfo,mapinfo$CHR)
At this point I can use the following line in order to subset my desired regions individually by chromosome:
> CHR1<- eval(parse(text=paste0('subset(subs$chr1,',Mcodes["chr1"],')')))
I would like to subset all chromosome specific data frames contained in "subs" by their chromosome specific corresponding list in "Mcodes" without having to run that last line of code 24 different times (as I have multiple bed files for various histones/histone variants that will eventually need to be put through the same pipeline). Is there a way to loop/apply/something to make that possible?
Sorry if it seems a trivial question - I am still very new to the R/programming game. Thanks for any advice.