1

I have a list of 58 dataframes under the list named nafilt_persample.ngsrep. Inside it are 58 df, named according to individual IDs: SVT_01...58. Each df contains 15 columns with either characters or numbers like:

   Tumor_Sample_Barcode Hugo_Symbol Chromosome Start_Position End_Position Variant_Type Variant_Classification coverage   VAF
1:               SVT_01      DNMT3A       chr2       25464495     25464495          SNP      Missense_Mutation     2835 0.011
2:               SVT_01        JAK2       chr9        5073770      5073770          SNP      Missense_Mutation     2533 0.028
   cDNA_Change Protein_Change Reference_Allele Tumor_Seq_Allele2 ref_reads var_reads
1:   c.2018G>T    p.Gly673Val                C                 A      2808        27
2:   c.1849G>T        p.V617F                G                 T      2455        78

I need to add to each df in the list two columns lCI and uCI with values coming from a second list that is ordered according to the same ID, (SVT_) and gene and looks like this (called cint):

$DNMT3A
[1] 0.006285366 0.013826599
attr(,"conf.level")
[1] 0.95

$JAK2
[1] 0.02441547 0.03828421
attr(,"conf.level")
[1] 0.95

I would like to obtain a result like this:

   Tumor_Sample_Barcode Hugo_Symbol Chromosome Start_Position End_Position Variant_Type Variant_Classification coverage   VAF
1:               SVT_01      DNMT3A       chr2       25464495     25464495          SNP      Missense_Mutation     2835 0.011
2:               SVT_01        JAK2       chr9        5073770      5073770          SNP      Missense_Mutation     2533 0.028
   cDNA_Change Protein_Change Reference_Allele Tumor_Seq_Allele2 ref_reads var_reads lCI  uCI
1:   c.2018G>T    p.Gly673Val                C                 A      2808        27 0.06  0.013
2:   c.1849G>T        p.V617F                G                 T      2455        78 0.024 0.038

So far I have tried this but without success:

merged.list <- list()

for (i in names(nafilt_persample.ngsrep)){ for (k in nafilt_persample.ngsrep[[i]]$Hugo_Symbol){
  merged.list[[i]] <- cbind(nafilt_persample.ngsrep[[i]], cint[[i]][[k]][1], cint[[i]][[k]][2])
    }
}

The error here is that despite the two columns are added, only values from the last cycle item are added, So in the example of SVT_01 shown above this is the result:

   Tumor_Sample_Barcode Hugo_Symbol Chromosome Start_Position End_Position Variant_Type Variant_Classification coverage   VAF
1:               SVT_01      DNMT3A       chr2       25464495     25464495          SNP      Missense_Mutation     2835 0.011
2:               SVT_01        JAK2       chr9        5073770      5073770          SNP      Missense_Mutation     2533 0.028
   cDNA_Change Protein_Change Reference_Allele Tumor_Seq_Allele2 ref_reads var_reads lCI  uCI
1:   c.2018G>T    p.Gly673Val                C                 A      2808        27 0.024  0.038
2:   c.1849G>T        p.V617F                G                 T      2455        78 0.024 0.038

That is: the CI of JAK2 is duplicated onto the DNMT3A row. How can I fix this? Hope I provided enough info

1
  • Your merged.list seems to have no names, whereas i is the names of the list? Perhaps you want to loop over the sequence of the list Commented Oct 29, 2022 at 16:58

2 Answers 2

1

We could do

nafilt_persample.ngsrep <- Map(function(dat, nm), 
    {
    dat[c("lCI", "uCI")] <- nm[dat$Hugo_Symbol]
    dat
       

   },
    nafilt_persample.ngsrep, cint)

Or with for loop

for(nm in names(nafilt_persample.ngsrep)) 
   {
   nafilt_persample.ngsrep[[nm]][c("lCI", "uCI")] <- 
       cint[[nm]][nafilt_persample.ngsrep[[nm]]$Hugo_Symbol]
   }
Sign up to request clarification or add additional context in comments.

Comments

0

Here is another option. I would recommend making a small reproducible example in the future. Here is one:

library(tidyverse)

#example data
nafilt_persample.ngsrep <- map(1:3, ~tibble(Tumor_Sample_Barcode =c(glue::glue("SVT_0{.x}")),
                 Hugo_Symbol = c("DNMT3A", "JAK2"))) |>
  `names<-`(paste0("SVT_0", 1:3))

set.seed(32)
cint <- map(1:3, ~ list(DNMT3A = c(runif(1, 0, 0.3), runif(1, 0.3, 1)),
                        JAK2 = c(runif(1, 0, 0.3), runif(1, 0.3, 1))))


#solution
map2(nafilt_persample.ngsrep, cint, 
    \(dat, ci){
      col_add <- tibble(name = names(ci),
                        dat = ci) |>
        unnest_wider(dat, names_repair = \(x) c("Hugo_Symbol", "lCI",  "uCI")) |>
        suppressMessages()
      
      left_join(dat, col_add, by = "Hugo_Symbol")
    })
#> $SVT_01
#> # A tibble: 2 x 4
#>   Tumor_Sample_Barcode Hugo_Symbol   lCI   uCI
#>   <chr>                <chr>       <dbl> <dbl>
#> 1 SVT_01               DNMT3A      0.152 0.716
#> 2 SVT_01               JAK2        0.243 0.810
#> 
#> $SVT_02
#> # A tibble: 2 x 4
#>   Tumor_Sample_Barcode Hugo_Symbol    lCI   uCI
#>   <chr>                <chr>        <dbl> <dbl>
#> 1 SVT_02               DNMT3A      0.0456 0.969
#> 2 SVT_02               JAK2        0.226  0.896
#> 
#> $SVT_03
#> # A tibble: 2 x 4
#>   Tumor_Sample_Barcode Hugo_Symbol   lCI   uCI
#>   <chr>                <chr>       <dbl> <dbl>
#> 1 SVT_03               DNMT3A      0.202 0.571
#> 2 SVT_03               JAK2        0.197 0.525

Explaination:

  • map2: iterates across 2 lists
  • \(dat, ci): anonymous function defining the first list in map2 as dat and the second as ci
  • col_add: name of a new dataframe that contains the columns that you are trying to add.
  • tibble: creates a new dataframe where name is the names of the list object, which is the same as Hugo_Symbol and dat which is the data you are adding from ci. Note here that this dat is different than the one defined by the anonymous function, which is just a coincidence because I use the word dat a lot in my code. That's not good practice, so I recommend changing that.
  • unnest_wider takes that values of ci and puts than in two separate columns rather than in 1 nested column. Note that these columns do not have names, so I use names_repair to add them.
  • suppressMessages: stops a message that tells you that the unnested columns do not have names. You get this message per list is dat, so I suppress it to keep the output clean.
  • left_join: adds the new columns to the original data by joining on the Hugo_Symbol.

Overall, this code works because your two lists are in the same order. If your lists are not in the same order, then you are going to need to refine this to make sure that the Tumor_Sample_Barcode match across the lists.

2 Comments

Worked perfectly. Thank you. I apologize for not leaving a reproducible example. Still I cannot get a hold of what your code exactly does, if you don't mind explaining a little bit i would be very glad (maybe also why my cycle didn't work). Otherwise, thank you a lot for the solution!
I added some explanation for you. Let me know if anything is unclear. I'm not totally sure why yours isn't working, but i suspect it is your indexing. In my opinion its tougher to trouble shoot someone else's code than it is to come up with my own solution to the question.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.