Adding columns/values on a list of dataframes from a second list

Question

I have a list of 58 dataframes under the list named nafilt_persample.ngsrep. Inside it are 58 df, named according to individual IDs: SVT_01...58. Each df contains 15 columns with either characters or numbers like:

   Tumor_Sample_Barcode Hugo_Symbol Chromosome Start_Position End_Position Variant_Type Variant_Classification coverage   VAF
1:               SVT_01      DNMT3A       chr2       25464495     25464495          SNP      Missense_Mutation     2835 0.011
2:               SVT_01        JAK2       chr9        5073770      5073770          SNP      Missense_Mutation     2533 0.028
   cDNA_Change Protein_Change Reference_Allele Tumor_Seq_Allele2 ref_reads var_reads
1:   c.2018G>T    p.Gly673Val                C                 A      2808        27
2:   c.1849G>T        p.V617F                G                 T      2455        78

I need to add to each df in the list two columns lCI and uCI with values coming from a second list that is ordered according to the same ID, (SVT_) and gene and looks like this (called cint):

$DNMT3A
[1] 0.006285366 0.013826599
attr(,"conf.level")
[1] 0.95

$JAK2
[1] 0.02441547 0.03828421
attr(,"conf.level")
[1] 0.95

I would like to obtain a result like this:

   Tumor_Sample_Barcode Hugo_Symbol Chromosome Start_Position End_Position Variant_Type Variant_Classification coverage   VAF
1:               SVT_01      DNMT3A       chr2       25464495     25464495          SNP      Missense_Mutation     2835 0.011
2:               SVT_01        JAK2       chr9        5073770      5073770          SNP      Missense_Mutation     2533 0.028
   cDNA_Change Protein_Change Reference_Allele Tumor_Seq_Allele2 ref_reads var_reads lCI  uCI
1:   c.2018G>T    p.Gly673Val                C                 A      2808        27 0.06  0.013
2:   c.1849G>T        p.V617F                G                 T      2455        78 0.024 0.038

So far I have tried this but without success:

merged.list <- list()

for (i in names(nafilt_persample.ngsrep)){ for (k in nafilt_persample.ngsrep[[i]]$Hugo_Symbol){
  merged.list[[i]] <- cbind(nafilt_persample.ngsrep[[i]], cint[[i]][[k]][1], cint[[i]][[k]][2])
    }
}

The error here is that despite the two columns are added, only values from the last cycle item are added, So in the example of SVT_01 shown above this is the result:

   Tumor_Sample_Barcode Hugo_Symbol Chromosome Start_Position End_Position Variant_Type Variant_Classification coverage   VAF
1:               SVT_01      DNMT3A       chr2       25464495     25464495          SNP      Missense_Mutation     2835 0.011
2:               SVT_01        JAK2       chr9        5073770      5073770          SNP      Missense_Mutation     2533 0.028
   cDNA_Change Protein_Change Reference_Allele Tumor_Seq_Allele2 ref_reads var_reads lCI  uCI
1:   c.2018G>T    p.Gly673Val                C                 A      2808        27 0.024  0.038
2:   c.1849G>T        p.V617F                G                 T      2455        78 0.024 0.038

That is: the CI of JAK2 is duplicated onto the DNMT3A row. How can I fix this? Hope I provided enough info

Your merged.list seems to have no names, whereas i is the names of the list? Perhaps you want to loop over the sequence of the list — akrun
– akrun, Commented Oct 29, 2022 at 16:58

akrun · Accepted Answer · 2022-10-29 17:04:20Z

1

We could do

nafilt_persample.ngsrep <- Map(function(dat, nm), 
    {
    dat[c("lCI", "uCI")] <- nm[dat$Hugo_Symbol]
    dat
       

   },
    nafilt_persample.ngsrep, cint)

Or with for loop

for(nm in names(nafilt_persample.ngsrep)) 
   {
   nafilt_persample.ngsrep[[nm]][c("lCI", "uCI")] <- 
       cint[[nm]][nafilt_persample.ngsrep[[nm]]$Hugo_Symbol]
   }

answered Oct 29, 2022 at 17:04

akrun

891k38 gold badges590 silver badges700 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

AndS. · Accepted Answer · 2022-10-30 13:50:24Z

Here is another option. I would recommend making a small reproducible example in the future. Here is one:

library(tidyverse)

#example data
nafilt_persample.ngsrep <- map(1:3, ~tibble(Tumor_Sample_Barcode =c(glue::glue("SVT_0{.x}")),
                 Hugo_Symbol = c("DNMT3A", "JAK2"))) |>
  `names<-`(paste0("SVT_0", 1:3))

set.seed(32)
cint <- map(1:3, ~ list(DNMT3A = c(runif(1, 0, 0.3), runif(1, 0.3, 1)),
                        JAK2 = c(runif(1, 0, 0.3), runif(1, 0.3, 1))))


#solution
map2(nafilt_persample.ngsrep, cint, 
    \(dat, ci){
      col_add <- tibble(name = names(ci),
                        dat = ci) |>
        unnest_wider(dat, names_repair = \(x) c("Hugo_Symbol", "lCI",  "uCI")) |>
        suppressMessages()
      
      left_join(dat, col_add, by = "Hugo_Symbol")
    })
#> $SVT_01
#> # A tibble: 2 x 4
#>   Tumor_Sample_Barcode Hugo_Symbol   lCI   uCI
#>   <chr>                <chr>       <dbl> <dbl>
#> 1 SVT_01               DNMT3A      0.152 0.716
#> 2 SVT_01               JAK2        0.243 0.810
#> 
#> $SVT_02
#> # A tibble: 2 x 4
#>   Tumor_Sample_Barcode Hugo_Symbol    lCI   uCI
#>   <chr>                <chr>        <dbl> <dbl>
#> 1 SVT_02               DNMT3A      0.0456 0.969
#> 2 SVT_02               JAK2        0.226  0.896
#> 
#> $SVT_03
#> # A tibble: 2 x 4
#>   Tumor_Sample_Barcode Hugo_Symbol   lCI   uCI
#>   <chr>                <chr>       <dbl> <dbl>
#> 1 SVT_03               DNMT3A      0.202 0.571
#> 2 SVT_03               JAK2        0.197 0.525

Explaination:

map2: iterates across 2 lists
\(dat, ci): anonymous function defining the first list in map2 as dat and the second as ci
col_add: name of a new dataframe that contains the columns that you are trying to add.
tibble: creates a new dataframe where name is the names of the list object, which is the same as Hugo_Symbol and dat which is the data you are adding from ci. Note here that this dat is different than the one defined by the anonymous function, which is just a coincidence because I use the word dat a lot in my code. That's not good practice, so I recommend changing that.
unnest_wider takes that values of ci and puts than in two separate columns rather than in 1 nested column. Note that these columns do not have names, so I use names_repair to add them.
suppressMessages: stops a message that tells you that the unnested columns do not have names. You get this message per list is dat, so I suppress it to keep the output clean.
left_join: adds the new columns to the original data by joining on the Hugo_Symbol.

Overall, this code works because your two lists are in the same order. If your lists are not in the same order, then you are going to need to refine this to make sure that the Tumor_Sample_Barcode match across the lists.

Worked perfectly. Thank you. I apologize for not leaving a reproducible example. Still I cannot get a hold of what your code exactly does, if you don't mind explaining a little bit i would be very glad (maybe also why my cycle didn't work). Otherwise, thank you a lot for the solution!
I added some explanation for you. Let me know if anything is unclear. I'm not totally sure why yours isn't working, but i suspect it is your indexing. In my opinion its tougher to trouble shoot someone else's code than it is to come up with my own solution to the question.

Collectives™ on Stack Overflow

Adding columns/values on a list of dataframes from a second list

2 Answers 2

Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related