R, dplyr split dataframe by string in columns

Question

I've seen examples here that show how to split a dataframe using column index ranges but how do I split my dataframe with dplyr using strings found in the column? I've purposely created my dataframes so the columns have certain strings in them for future splitting.

Example data:

Site   A_Argas   A_Arachnicea   A_Brus   B_Burkoll   B_Brielle  B_Bact
 1       10        0              0         0           0         0
 2       0         0              0         10          22        123
 3       1         2              3         88          12        546

I want to split this dataframe up based on strings such as "A_" or "B_" and assign them to new dataframes.

For example the output would be:

 dataframeA
 Site   A_Argas   A_Arachnicea   A_Brus 
 1       10        0              0        
 2       0         0              0        
 3       1         2              3       

 dataframeB
 Site   B_Burkoll   B_Brielle  B_Bact
  1       0           0         0
  2      10          22        123
  3      88          12        546

Because this data is not in long format, I can't seem to figure how to change my old code that I used to split the longform dataframes (for a different analysis).

dataframeA <- data %>% filter("GroupID" == "Arachnids") # where "A_" in column headers signify arachnid species
dataframeB <- data %>% filter("GroupID" == "Bacteria") # where all "B_" in the column headers are bacterial species

Tim Biegeleisen · Accepted Answer · 2022-02-07 16:47:56Z

2

One base R option might be to grep for the columns starting with A_ and B_, then to subset your original data frame.

a_names <- grep("^A_", names(data), value=TRUE)
dataframeA <- data[ , c("Site", a_names)]
b_names <- grep("^B_", names(data), value=TRUE)
dataframeB <- data[ , c("Site", b_names)]

answered Feb 7, 2022 at 16:47

Tim Biegeleisen

526k32 gold badges324 silver badges399 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

CuriousDude Over a year ago

Actually, it didn't work with my full dataframe. Not sure why because the sample I gave here is just a small snippet of it... Your solution does work with the sample set I gave. My full dataframe just has more mini strings like "V_" etc. so I don't see why this won't work

CuriousDude Over a year ago

Oh I think the issue is because I decided to get rid of the "Sites" column and just shove those into my index. It worked once I just got rid of the "Site" call and just used data[,a_names]

PaulS · Accepted Answer · 2022-02-07 17:32:11Z

Another possible solution, based on group_split, pivot_longer, pivot_wider and map:

library(tidyverse)

df <- structure(list(Site = c(1, 2, 3), A_Argas = c(10, 0, 1), A_Arachnicea = c(0, 
0, 2), A_Brus = c(0, 0, 3), B_Burkoll = c(0, 10, 88), B_Brielle = c(0, 
22, 12), B_Bact = c(0, 123, 546)), row.names = c(NA, -3L), class = "data.frame")

df %>%
  pivot_longer(-1) %>% 
  group_split(str_extract(name, "^[A-Z]\\_")) %>% 
  map(~ pivot_wider(.x, id_cols = Site))

#> [[1]]
#> # A tibble: 3 × 4
#>    Site A_Argas A_Arachnicea A_Brus
#>   <dbl>   <dbl>        <dbl>  <dbl>
#> 1     1      10            0      0
#> 2     2       0            0      0
#> 3     3       1            2      3
#> 
#> [[2]]
#> # A tibble: 3 × 4
#>    Site B_Burkoll B_Brielle B_Bact
#>   <dbl>     <dbl>     <dbl>  <dbl>
#> 1     1         0         0      0
#> 2     2        10        22    123
#> 3     3        88        12    546

akrun · Accepted Answer · 2022-02-07 16:57:10Z

We may use split.default for this in base R, remove the first column ('Site'), split the data into a list of dataframe by removing the substring from the _ (using trimws), cbind the first column in the list of data.frames and change the list name

lst1 <- split.default(data[-1], trimws(names(data)[-1], whitespace = "_.*"))

lst2 <- Map(cbind, Site = data['Site'], lst1)
names(lst2) <- paste0("dataframe", names(lst1))

-output

> lst2
$dataframeA
  Site A_Argas A_Arachnicea A_Brus
1    1      10            0      0
2    2       0            0      0
3    3       1            2      3

$dataframeB
  Site B_Burkoll B_Brielle B_Bact
1    1         0         0      0
2    2        10        22    123
3    3        88        12    546

It may be better to keep it in a list and access the elements as

lst2[["dataframeA"]]
lst2$dataframeA

instead of creating multiple objects (although can be created with list2env

list2env(lst2, .GlobalEnv)

With tidyverse, we could loop over the substring of the column names and select the relevant columns with map

library(purrr)
library(dplyr)
library(stringr)
str_remove(names(data)[-1] , "_.*") %>%
   unique %>% 
   map(~ data %>% 
   select(Site, starts_with(.x)))

-output

[[1]]
  Site A_Argas A_Arachnicea A_Brus
1    1      10            0      0
2    2       0            0      0
3    3       1            2      3

[[2]]
  Site B_Burkoll B_Brielle B_Bact
1    1         0         0      0
2    2        10        22    123
3    3        88        12    546

data

data <- structure(list(Site = 1:3, A_Argas = c(10L, 0L, 1L), A_Arachnicea = c(0L, 
0L, 2L), A_Brus = c(0L, 0L, 3L), B_Burkoll = c(0L, 10L, 88L), 
    B_Brielle = c(0L, 22L, 12L), B_Bact = c(0L, 123L, 546L)), 
class = "data.frame", row.names = c(NA, 
-3L))

Collectives™ on Stack Overflow

R, dplyr split dataframe by string in columns

3 Answers 3

2 Comments

Comments

data

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

Comments

data

Comments

Your Answer

Sign up or log in

Post as a guest

Related