for loop with dplyr

Question

I have a bunch of files I read in manually as such:

# gel above replicates

    A_gel <-read.delim("XL1_3_S35_L004_R1_001_w_XL2_3_S37_L004_R1_001_01.basedon.peaks.l2inputnormnew.bed.compressed.bed")
    
    B_gel <-read.delim("XL2_3_S37_L004_R1_001_w_XL2_3_S37_L004_R1_001_01.basedon.peaks.l2inputnormnew.bed.compressed.bed")
    
    C_gel <- read.delim("XL2_3_S37_L004_R1_001_w_XL1_3_S35_L004_R1_001_01.basedon.peaks.l2inputnormnew.bed.compressed.bed")
    
    D_gel <- read.delim("XL1_3_S35_L004_R1_001_w_XL1_3_S35_L004_R1_001_01.basedon.peaks.l2inputnormnew.bed.compressed.bed")
    
# gel below replicates
    
    A_below_gel <- read.delim("XL1_3b_S36_L004_R1_001_w_XL2_3b_S38_L004_R1_001_01.basedon.peaks.l2inputnormnew.bed.compressed.bed")
    
    B_below_gel <- read.delim("XL2_3b_S38_L004_R1_001_w_XL2_3b_S38_L004_R1_001_01.basedon.peaks.l2inputnormnew.bed.compressed.bed")
    
    C_below_gel <- read.delim("XL2_3b_S38_L004_R1_001_w_XL1_3b_S36_L004_R1_001_01.basedon.peaks.l2inputnormnew.bed.compressed.bed")
    
    D_below_gel <- read.delim("XL1_3b_S36_L004_R1_001_w_XL1_3b_S36_L004_R1_001_01.basedon.peaks.l2inputnormnew.bed.compressed.bed")

I would like to change all the columns of these files and arrange by the start column with something like this:

colnames(A_gel) <- c("Chromosome", "Start", "End", "LogPVal", "LogFC", "Strand")
    
A_gel <- A_gel %>%
      arrange(A_gel$Start)

Instead, I would like to use a for loop for all files using R.

Konrad Rudolph · Accepted Answer · 2021-06-30 16:58:12Z

Never create multiple variables following the same pattern. The properly supported solution for this general problem is the use of lists (i.e. instead of having variables A_gel, B_gel, …, you have one variable gel, which is a list that contains your individual data.frames; you can also assign names to these individual items, though in your case that doesn’t seem necessary).

Then you can use e.g. lapply to run over your file paths and read the data of the different files into that list:

gel = lapply(gel_filenames, read.delim)
below_gel = lapply(below_gel_filenames, read.delim)

… and likewise you can put your arrangement code into a function and apply that, changing the above to:

read_bed = function (filename) {
    read.delim(filename) %>%
        setNames(c("Chromosome", "Start", "End", "LogPVal", "LogFC", "Strand")) %>%
        arrange(Start)
}

# …

gel = lapply(gel_filenames, read_bed)

Better yet, use purrr::map_dfr to read all data into a single combined table:

gel = gel_filenames %>%
    setNames(., .) %>%
    map_dfr(read_bed, .id = 'Filename')

(The setNames(., .) step is necessary since read_dfr assigns the names of the input vector to the added ID column.)

This will create one master table for the “GEL” dat, which has an added ID column for the original filename (you’ll probably want to extract just some ID from that, using tidyr::extract).

Collectives™ on Stack Overflow

for loop with dplyr

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related