Splitting a data frame based on character string

Question

I would like to split the following data frame based on the final numbers of each element. So I would like 6 new data frames each with two elements. Here is my attempt at obtaining a data frame of the first subset containing just "ABCD-1" and "ABCC-1", but it doesn't seem to be working.

library("reshape2")
Barcode <- c("ABCD-1", "ABCC-1", "ABCD-2", "ABCC-2", "ABCD-3", "ABCC-3", 
"ABCD-4", "ABCC-4", "ABCD-5", "ABCC-5","ABCD-6", "ABCC-6")
bar_f <- data.frame(Barcode)
bar_f

bar_f$SampleID <- colsplit(bar_f$Barcode, pattern = "-", names = c("a","b"))$b
bar_f.s1 <- subset(barcode_file, barcode_file$SampleID == "1")
bar_f.s1

Can you help?

Thank you,

Abigail

Valentin_Ștefan · Accepted Answer · 2020-02-25 13:33:33Z

The main idea is to create a factor used to define the grouping for splitting. One way is by extracting the digits pattern form the provided variable Barcode using regular expression. Then we convert the obtained character vector of digits to a factor with as.factor(). We can, of course, use other regular expression techniques to get the job done, or more user friendly wrapper functions from the stringr package, like in the second example (the tidyverse-ish approach).

Example 1

A base R solution using split:

# The provided data
Barcode <- c("ABCD-1", "ABCC-1", "ABCD-2", "ABCC-2", "ABCD-3", "ABCC-3", 
             "ABCD-4", "ABCC-4", "ABCD-5", "ABCC-5","ABCD-6", "ABCC-6")
bar_f <- data.frame(Barcode)

factor_for_split <- regmatches(x = bar_f$Barcode,
                               m = regexpr(pattern = "[[:digit:]]",
                                           text = bar_f$Barcode))
factor_for_split
#>  [1] "1" "1" "2" "2" "3" "3" "4" "4" "5" "5" "6" "6"

# Create a list of 6 data frames as asked
lst <- split(x = bar_f, f = as.factor(factor_for_split))
lst
#> $`1`
#>   Barcode
#> 1  ABCD-1
#> 2  ABCC-1
#> 
#> $`2`
#>   Barcode
#> 3  ABCD-2
#> 4  ABCC-2
#> 
#> $`3`
#>   Barcode
#> 5  ABCD-3
#> 6  ABCC-3
#> 
#> $`4`
#>   Barcode
#> 7  ABCD-4
#> 8  ABCC-4
#> 
#> $`5`
#>    Barcode
#> 9   ABCD-5
#> 10  ABCC-5
#> 
#> $`6`
#>    Barcode
#> 11  ABCD-6
#> 12  ABCC-6

# Edit names of the list
names(lst) <- paste0("df_", names(lst))

# Assign each data frame from the list to a data frame object in the global
# environment
for(name in names(lst)) {
  assign(name, lst[[name]])
}

^{Created on 2020-02-24 by the reprex package (v0.3.0)}

Example 2

And, if you prefer, here is a tidyverse-ish approach:

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(stringr)

Barcode <- c("ABCD-1", "ABCC-1", "ABCD-2", "ABCC-2", "ABCD-3", "ABCC-3", 
             "ABCD-4", "ABCC-4", "ABCD-5", "ABCC-5","ABCD-6", "ABCC-6")
bar_f <- data.frame(Barcode)

bar_f %>% 
  mutate(factor_for_split = str_extract(string = Barcode,
                                        pattern = "[[:digit:]]")) %>% 
  group_split(factor_for_split)
#> [[1]]
#> # A tibble: 2 x 2
#>   Barcode factor_for_split
#>   <fct>   <chr>           
#> 1 ABCD-1  1               
#> 2 ABCC-1  1               
#> 
#> [[2]]
#> # A tibble: 2 x 2
#>   Barcode factor_for_split
#>   <fct>   <chr>           
#> 1 ABCD-2  2               
#> 2 ABCC-2  2               
#> 
#> [[3]]
#> # A tibble: 2 x 2
#>   Barcode factor_for_split
#>   <fct>   <chr>           
#> 1 ABCD-3  3               
#> 2 ABCC-3  3               
#> 
#> [[4]]
#> # A tibble: 2 x 2
#>   Barcode factor_for_split
#>   <fct>   <chr>           
#> 1 ABCD-4  4               
#> 2 ABCC-4  4               
#> 
#> [[5]]
#> # A tibble: 2 x 2
#>   Barcode factor_for_split
#>   <fct>   <chr>           
#> 1 ABCD-5  5               
#> 2 ABCC-5  5               
#> 
#> [[6]]
#> # A tibble: 2 x 2
#>   Barcode factor_for_split
#>   <fct>   <chr>           
#> 1 ABCD-6  6               
#> 2 ABCC-6  6               
#> 
#> attr(,"ptype")
#> # A tibble: 0 x 2
#> # ... with 2 variables: Barcode <fct>, factor_for_split <chr>

names(lst) <- paste0("df_", 1:length(lst))
for(name in names(lst)) {
  assign(name, lst[[name]])

^{Created on 2020-02-24 by the reprex package (v0.3.0)}

Thanks Valentin! Once I type split(x = bar_f, f = as.factor(factor_for_split)), how do I then assign each to be a data frame on its own, to have 6 new data frames say bar_f1, bar_f2,...,bar_f6? Thanks!
Hi @Abigail575, in that case you can use the assign function on each element (data.frame) of the created list with split. I updated my answer accordingly. It wasn't clear to me that you wanted each data frame as a separate object in the global environment. In my opinion, a more R canonical way of dealing with such cases is keeping objects in a list.

Roman · Accepted Answer · 2020-02-24 15:21:46Z

1

you can try

library(tidyverse)
separate(bar_f, Barcode, into = letters[1:2], sep ="-")

and the full tidyvers-way could look like

bar_f %>% 
  separate(Barcode, into = letters[1:2], sep ="-") %>% 
  filter(b == 1)
     a b
1 ABCD 1
2 ABCC 1

in base R you can try a gsub which removes letters & LETTERS and -

bar_f$SampleID <- gsub("[aA-zZ|-]","",bar_f$Barcode)
head(bar_f)
  Barcode SampleID
1  ABCD-1        1
2  ABCC-1        1
3  ABCD-2        2
4  ABCC-2        2
5  ABCD-3        3
6  ABCC-3        3

edited Feb 24, 2020 at 15:21

answered Feb 24, 2020 at 12:16

Roman

17.7k3 gold badges39 silver badges52 bronze badges

Comments

B. Christian Kamgang · Accepted Answer · 2020-02-24 15:24:32Z

1

Here is an another solution using built-in functions:

dfs <- split(bar_f, gsub("\\D", "", DT$Barcode))
names(dfs) <- paste0("df_", names(dfs))

for(nm in names(dfs)) assign(nm, dfs[[nm]])

answered Feb 24, 2020 at 15:24

B. Christian Kamgang

6,5348 silver badges11 bronze badges

Collectives™ on Stack Overflow

Splitting a data frame based on character string

3 Answers 3

Example 1

Example 2

2 Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Example 1

Example 2

2 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related