1

I would like to split the following data frame based on the final numbers of each element. So I would like 6 new data frames each with two elements. Here is my attempt at obtaining a data frame of the first subset containing just "ABCD-1" and "ABCC-1", but it doesn't seem to be working.

library("reshape2")
Barcode <- c("ABCD-1", "ABCC-1", "ABCD-2", "ABCC-2", "ABCD-3", "ABCC-3", 
"ABCD-4", "ABCC-4", "ABCD-5", "ABCC-5","ABCD-6", "ABCC-6")
bar_f <- data.frame(Barcode)
bar_f

bar_f$SampleID <- colsplit(bar_f$Barcode, pattern = "-", names = c("a","b"))$b
bar_f.s1 <- subset(barcode_file, barcode_file$SampleID == "1")
bar_f.s1

Can you help?

Thank you,

Abigail

3 Answers 3

3

The main idea is to create a factor used to define the grouping for splitting. One way is by extracting the digits pattern form the provided variable Barcode using regular expression. Then we convert the obtained character vector of digits to a factor with as.factor(). We can, of course, use other regular expression techniques to get the job done, or more user friendly wrapper functions from the stringr package, like in the second example (the tidyverse-ish approach).

Example 1

A base R solution using split:

# The provided data
Barcode <- c("ABCD-1", "ABCC-1", "ABCD-2", "ABCC-2", "ABCD-3", "ABCC-3", 
             "ABCD-4", "ABCC-4", "ABCD-5", "ABCC-5","ABCD-6", "ABCC-6")
bar_f <- data.frame(Barcode)

factor_for_split <- regmatches(x = bar_f$Barcode,
                               m = regexpr(pattern = "[[:digit:]]",
                                           text = bar_f$Barcode))
factor_for_split
#>  [1] "1" "1" "2" "2" "3" "3" "4" "4" "5" "5" "6" "6"

# Create a list of 6 data frames as asked
lst <- split(x = bar_f, f = as.factor(factor_for_split))
lst
#> $`1`
#>   Barcode
#> 1  ABCD-1
#> 2  ABCC-1
#> 
#> $`2`
#>   Barcode
#> 3  ABCD-2
#> 4  ABCC-2
#> 
#> $`3`
#>   Barcode
#> 5  ABCD-3
#> 6  ABCC-3
#> 
#> $`4`
#>   Barcode
#> 7  ABCD-4
#> 8  ABCC-4
#> 
#> $`5`
#>    Barcode
#> 9   ABCD-5
#> 10  ABCC-5
#> 
#> $`6`
#>    Barcode
#> 11  ABCD-6
#> 12  ABCC-6

# Edit names of the list
names(lst) <- paste0("df_", names(lst))

# Assign each data frame from the list to a data frame object in the global
# environment
for(name in names(lst)) {
  assign(name, lst[[name]])
}

Created on 2020-02-24 by the reprex package (v0.3.0)

Example 2

And, if you prefer, here is a tidyverse-ish approach:

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(stringr)

Barcode <- c("ABCD-1", "ABCC-1", "ABCD-2", "ABCC-2", "ABCD-3", "ABCC-3", 
             "ABCD-4", "ABCC-4", "ABCD-5", "ABCC-5","ABCD-6", "ABCC-6")
bar_f <- data.frame(Barcode)

bar_f %>% 
  mutate(factor_for_split = str_extract(string = Barcode,
                                        pattern = "[[:digit:]]")) %>% 
  group_split(factor_for_split)
#> [[1]]
#> # A tibble: 2 x 2
#>   Barcode factor_for_split
#>   <fct>   <chr>           
#> 1 ABCD-1  1               
#> 2 ABCC-1  1               
#> 
#> [[2]]
#> # A tibble: 2 x 2
#>   Barcode factor_for_split
#>   <fct>   <chr>           
#> 1 ABCD-2  2               
#> 2 ABCC-2  2               
#> 
#> [[3]]
#> # A tibble: 2 x 2
#>   Barcode factor_for_split
#>   <fct>   <chr>           
#> 1 ABCD-3  3               
#> 2 ABCC-3  3               
#> 
#> [[4]]
#> # A tibble: 2 x 2
#>   Barcode factor_for_split
#>   <fct>   <chr>           
#> 1 ABCD-4  4               
#> 2 ABCC-4  4               
#> 
#> [[5]]
#> # A tibble: 2 x 2
#>   Barcode factor_for_split
#>   <fct>   <chr>           
#> 1 ABCD-5  5               
#> 2 ABCC-5  5               
#> 
#> [[6]]
#> # A tibble: 2 x 2
#>   Barcode factor_for_split
#>   <fct>   <chr>           
#> 1 ABCD-6  6               
#> 2 ABCC-6  6               
#> 
#> attr(,"ptype")
#> # A tibble: 0 x 2
#> # ... with 2 variables: Barcode <fct>, factor_for_split <chr>

names(lst) <- paste0("df_", 1:length(lst))
for(name in names(lst)) {
  assign(name, lst[[name]])

Created on 2020-02-24 by the reprex package (v0.3.0)

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks Valentin! Once I type split(x = bar_f, f = as.factor(factor_for_split)), how do I then assign each to be a data frame on its own, to have 6 new data frames say bar_f1, bar_f2,...,bar_f6? Thanks!
Hi @Abigail575, in that case you can use the assign function on each element (data.frame) of the created list with split. I updated my answer accordingly. It wasn't clear to me that you wanted each data frame as a separate object in the global environment. In my opinion, a more R canonical way of dealing with such cases is keeping objects in a list.
1

you can try

library(tidyverse)
separate(bar_f, Barcode, into = letters[1:2], sep ="-")

and the full tidyvers-way could look like

bar_f %>% 
  separate(Barcode, into = letters[1:2], sep ="-") %>% 
  filter(b == 1)
     a b
1 ABCD 1
2 ABCC 1

in base R you can try a gsub which removes letters & LETTERS and -

bar_f$SampleID <- gsub("[aA-zZ|-]","",bar_f$Barcode)
head(bar_f)
  Barcode SampleID
1  ABCD-1        1
2  ABCC-1        1
3  ABCD-2        2
4  ABCC-2        2
5  ABCD-3        3
6  ABCC-3        3

Comments

1

Here is an another solution using built-in functions:

dfs <- split(bar_f, gsub("\\D", "", DT$Barcode))
names(dfs) <- paste0("df_", names(dfs))

for(nm in names(dfs)) assign(nm, dfs[[nm]])

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.