0

I have a bunch of files I read in manually as such:

# gel above replicates

    A_gel <-read.delim("XL1_3_S35_L004_R1_001_w_XL2_3_S37_L004_R1_001_01.basedon.peaks.l2inputnormnew.bed.compressed.bed")
    
    B_gel <-read.delim("XL2_3_S37_L004_R1_001_w_XL2_3_S37_L004_R1_001_01.basedon.peaks.l2inputnormnew.bed.compressed.bed")
    
    C_gel <- read.delim("XL2_3_S37_L004_R1_001_w_XL1_3_S35_L004_R1_001_01.basedon.peaks.l2inputnormnew.bed.compressed.bed")
    
    D_gel <- read.delim("XL1_3_S35_L004_R1_001_w_XL1_3_S35_L004_R1_001_01.basedon.peaks.l2inputnormnew.bed.compressed.bed")
    
# gel below replicates
    
    A_below_gel <- read.delim("XL1_3b_S36_L004_R1_001_w_XL2_3b_S38_L004_R1_001_01.basedon.peaks.l2inputnormnew.bed.compressed.bed")
    
    B_below_gel <- read.delim("XL2_3b_S38_L004_R1_001_w_XL2_3b_S38_L004_R1_001_01.basedon.peaks.l2inputnormnew.bed.compressed.bed")
    
    C_below_gel <- read.delim("XL2_3b_S38_L004_R1_001_w_XL1_3b_S36_L004_R1_001_01.basedon.peaks.l2inputnormnew.bed.compressed.bed")
    
    D_below_gel <- read.delim("XL1_3b_S36_L004_R1_001_w_XL1_3b_S36_L004_R1_001_01.basedon.peaks.l2inputnormnew.bed.compressed.bed")

I would like to change all the columns of these files and arrange by the start column with something like this:

colnames(A_gel) <- c("Chromosome", "Start", "End", "LogPVal", "LogFC", "Strand")
    
A_gel <- A_gel %>%
      arrange(A_gel$Start)

Instead, I would like to use a for loop for all files using R.

0

1 Answer 1

4

Never create multiple variables following the same pattern. The properly supported solution for this general problem is the use of lists (i.e. instead of having variables A_gel, B_gel, …, you have one variable gel, which is a list that contains your individual data.frames; you can also assign names to these individual items, though in your case that doesn’t seem necessary).

Then you can use e.g. lapply to run over your file paths and read the data of the different files into that list:

gel = lapply(gel_filenames, read.delim)
below_gel = lapply(below_gel_filenames, read.delim)

… and likewise you can put your arrangement code into a function and apply that, changing the above to:

read_bed = function (filename) {
    read.delim(filename) %>%
        setNames(c("Chromosome", "Start", "End", "LogPVal", "LogFC", "Strand")) %>%
        arrange(Start)
}

# …

gel = lapply(gel_filenames, read_bed)

Better yet, use purrr::map_dfr to read all data into a single combined table:

gel = gel_filenames %>%
    setNames(., .) %>%
    map_dfr(read_bed, .id = 'Filename')

(The setNames(., .) step is necessary since read_dfr assigns the names of the input vector to the added ID column.)

This will create one master table for the “GEL” dat, which has an added ID column for the original filename (you’ll probably want to extract just some ID from that, using tidyr::extract).

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.