4

I'm sure the answer to this will be VERY similar to this question but I just can't quite put it together.

I have two data frames. One is the data frame I'm working on:

df <- structure(list(Username = c("hmaens", "pgcmann",
                                  "gsamse", "gsamse", 
                                  "gsamse", "gamse"),
                     Title = c("Pharmacy Resident PGY2",
                               "Associate Professor of Pediatrics", 
                               "Regulatory Coordinator",
                               "Regulatory Coordinator",
                               "Regulatory Coordinator", 
                               "Regulatory Coordinator"),
                     `User Role` = c("Investigational Pharmacist", 
                                     "Principal Investigator",
                                     "Calendar Build",
                                     "Protocol Management", 
                                     "Subject Management",
                                     "Regulatory")),
                row.names = c(NA, -6L), class = c("tbl_df", 
                                                  "tbl", "data.frame"))

and one is they key:

key <- structure(list(username = c("hmaens", "pgcmann",
                                   "gsamse", "gsamse", 
                                   "gsamse", "gsamse"),
                      training = c(0, 0, 1, 
                                   1, 1, 1)),
                 row.names = c(NA, -6L), 
                 class = c("tbl_df", "tbl", "data.frame"))

I want to split my "df" data frame based on the "training" column in key. I.e. my results would be a data frame called dfZero with the exact same columns from df that had everyone from key with a "0" in training. And a separate data frame called dfOne with the 1's from key$training.

6
  • 4
    You could use df %>% left_join(key, by=c("Username"="username")) %>% split(~training). That will give you a list with the two separate data.frames. Commented Apr 7, 2022 at 15:12
  • What of the other usernames? gsame is not present in key, so that training is NA. Commented Apr 7, 2022 at 15:25
  • 1
    FYI @MrFlick, that method mostly works but drops the NA values. An alternative is to use dplyr::nest_by(training) which will preserve them. Commented Apr 7, 2022 at 15:30
  • dfZero <- df[df$username %in% key[key$training == 0, "username"],] Commented Apr 7, 2022 at 15:30
  • oops. gsame is my typo. They're all supposed to be gsamse Commented Apr 7, 2022 at 15:47

2 Answers 2

3

Using %in%

dfZero <- df[df$Username %in% key[key$training == 0, "username"],]
dfOne <- df[df$Username %in% key[key$training == 1, "username"],]

Using merge()

dfZero <- merge(df, key[key$training == 0,], by.x = "Username", by.y = "username")
dfOne <- merge(df, key[key$training == 1,], by.x = "Username", by.y = "username")
Sign up to request clarification or add additional context in comments.

Comments

0

Using dplyr:

library(dplyr)

dflist <- merge(df, key, by.x = "Username", by.y = "username") %>%
  unique() %>%
  group_by(training) %>%
  group_split() 

edit: You can extract the individual list elements like so:

dfzero <- dflist[[1]]
dfone <- dflist[[2]]

1 Comment

I'm sorry for the silly question because i'm sure this pretty much works, but I can't figure out how to then get these data frames out of the list. I've found other answers like here: stackoverflow.com/questions/59169631/… That explain that, but when I use those solutions my data frames dont have column names anymore. Would you mind please taking your answer one step further to actually having separate data frames? I need to ultimately use write_csv to export them

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.