3

I've seen examples here that show how to split a dataframe using column index ranges but how do I split my dataframe with dplyr using strings found in the column? I've purposely created my dataframes so the columns have certain strings in them for future splitting.

Example data:

Site   A_Argas   A_Arachnicea   A_Brus   B_Burkoll   B_Brielle  B_Bact
 1       10        0              0         0           0         0
 2       0         0              0         10          22        123
 3       1         2              3         88          12        546

I want to split this dataframe up based on strings such as "A_" or "B_" and assign them to new dataframes.

For example the output would be:

 dataframeA
 Site   A_Argas   A_Arachnicea   A_Brus 
 1       10        0              0        
 2       0         0              0        
 3       1         2              3       

 dataframeB
 Site   B_Burkoll   B_Brielle  B_Bact
  1       0           0         0
  2      10          22        123
  3      88          12        546

Because this data is not in long format, I can't seem to figure how to change my old code that I used to split the longform dataframes (for a different analysis).

dataframeA <- data %>% filter("GroupID" == "Arachnids") # where "A_" in column headers signify arachnid species
dataframeB <- data %>% filter("GroupID" == "Bacteria") # where all "B_" in the column headers are bacterial species

3 Answers 3

2

One base R option might be to grep for the columns starting with A_ and B_, then to subset your original data frame.

a_names <- grep("^A_", names(data), value=TRUE)
dataframeA <- data[ , c("Site", a_names)]
b_names <- grep("^B_", names(data), value=TRUE)
dataframeB <- data[ , c("Site", b_names)]
Sign up to request clarification or add additional context in comments.

2 Comments

Actually, it didn't work with my full dataframe. Not sure why because the sample I gave here is just a small snippet of it... Your solution does work with the sample set I gave. My full dataframe just has more mini strings like "V_" etc. so I don't see why this won't work
Oh I think the issue is because I decided to get rid of the "Sites" column and just shove those into my index. It worked once I just got rid of the "Site" call and just used data[,a_names]
2

Another possible solution, based on group_split, pivot_longer, pivot_wider and map:

library(tidyverse)

df <- structure(list(Site = c(1, 2, 3), A_Argas = c(10, 0, 1), A_Arachnicea = c(0, 
0, 2), A_Brus = c(0, 0, 3), B_Burkoll = c(0, 10, 88), B_Brielle = c(0, 
22, 12), B_Bact = c(0, 123, 546)), row.names = c(NA, -3L), class = "data.frame")

df %>%
  pivot_longer(-1) %>% 
  group_split(str_extract(name, "^[A-Z]\\_")) %>% 
  map(~ pivot_wider(.x, id_cols = Site))

#> [[1]]
#> # A tibble: 3 × 4
#>    Site A_Argas A_Arachnicea A_Brus
#>   <dbl>   <dbl>        <dbl>  <dbl>
#> 1     1      10            0      0
#> 2     2       0            0      0
#> 3     3       1            2      3
#> 
#> [[2]]
#> # A tibble: 3 × 4
#>    Site B_Burkoll B_Brielle B_Bact
#>   <dbl>     <dbl>     <dbl>  <dbl>
#> 1     1         0         0      0
#> 2     2        10        22    123
#> 3     3        88        12    546

Comments

1

We may use split.default for this in base R, remove the first column ('Site'), split the data into a list of dataframe by removing the substring from the _ (using trimws), cbind the first column in the list of data.frames and change the list name

lst1 <- split.default(data[-1], trimws(names(data)[-1], whitespace = "_.*"))

lst2 <- Map(cbind, Site = data['Site'], lst1)
names(lst2) <- paste0("dataframe", names(lst1))

-output

> lst2
$dataframeA
  Site A_Argas A_Arachnicea A_Brus
1    1      10            0      0
2    2       0            0      0
3    3       1            2      3

$dataframeB
  Site B_Burkoll B_Brielle B_Bact
1    1         0         0      0
2    2        10        22    123
3    3        88        12    546

It may be better to keep it in a list and access the elements as

lst2[["dataframeA"]]
lst2$dataframeA

instead of creating multiple objects (although can be created with list2env

list2env(lst2, .GlobalEnv)

With tidyverse, we could loop over the substring of the column names and select the relevant columns with map

library(purrr)
library(dplyr)
library(stringr)
str_remove(names(data)[-1] , "_.*") %>%
   unique %>% 
   map(~ data %>% 
   select(Site, starts_with(.x)))

-output

[[1]]
  Site A_Argas A_Arachnicea A_Brus
1    1      10            0      0
2    2       0            0      0
3    3       1            2      3

[[2]]
  Site B_Burkoll B_Brielle B_Bact
1    1         0         0      0
2    2        10        22    123
3    3        88        12    546

data

data <- structure(list(Site = 1:3, A_Argas = c(10L, 0L, 1L), A_Arachnicea = c(0L, 
0L, 2L), A_Brus = c(0L, 0L, 3L), B_Burkoll = c(0L, 10L, 88L), 
    B_Brielle = c(0L, 22L, 12L), B_Bact = c(0L, 123L, 546L)), 
class = "data.frame", row.names = c(NA, 
-3L))

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.