Filter data frame based off two columns in other data frame

Question

I'm sure the answer to this will be VERY similar to this question but I just can't quite put it together.

I have two data frames. One is the data frame I'm working on:

df <- structure(list(Username = c("hmaens", "pgcmann",
                                  "gsamse", "gsamse", 
                                  "gsamse", "gamse"),
                     Title = c("Pharmacy Resident PGY2",
                               "Associate Professor of Pediatrics", 
                               "Regulatory Coordinator",
                               "Regulatory Coordinator",
                               "Regulatory Coordinator", 
                               "Regulatory Coordinator"),
                     `User Role` = c("Investigational Pharmacist", 
                                     "Principal Investigator",
                                     "Calendar Build",
                                     "Protocol Management", 
                                     "Subject Management",
                                     "Regulatory")),
                row.names = c(NA, -6L), class = c("tbl_df", 
                                                  "tbl", "data.frame"))

and one is they key:

key <- structure(list(username = c("hmaens", "pgcmann",
                                   "gsamse", "gsamse", 
                                   "gsamse", "gsamse"),
                      training = c(0, 0, 1, 
                                   1, 1, 1)),
                 row.names = c(NA, -6L), 
                 class = c("tbl_df", "tbl", "data.frame"))

I want to split my "df" data frame based on the "training" column in key. I.e. my results would be a data frame called dfZero with the exact same columns from df that had everyone from key with a "0" in training. And a separate data frame called dfOne with the 1's from key$training.

You could use df %>% left_join(key, by=c("Username"="username")) %>% split(~training). That will give you a list with the two separate data.frames. — MrFlick
– MrFlick, Commented Apr 7, 2022 at 15:12
What of the other usernames? gsame is not present in key, so that training is NA. — r2evans
– r2evans, Commented Apr 7, 2022 at 15:25
FYI @MrFlick, that method mostly works but drops the NA values. An alternative is to use dplyr::nest_by(training) which will preserve them. — r2evans
– r2evans, Commented Apr 7, 2022 at 15:30
dfZero <- df[df$username %in% key[key$training == 0, "username"],] — Skaqqs
– Skaqqs, Commented Apr 7, 2022 at 15:30

Skaqqs · Accepted Answer · 2022-04-07 16:25:52Z

3

Using %in%

dfZero <- df[df$Username %in% key[key$training == 0, "username"],]
dfOne <- df[df$Username %in% key[key$training == 1, "username"],]

Using merge()

dfZero <- merge(df, key[key$training == 0,], by.x = "Username", by.y = "username")
dfOne <- merge(df, key[key$training == 1,], by.x = "Username", by.y = "username")

answered Apr 7, 2022 at 16:25

Skaqqs

4,1801 gold badge9 silver badges22 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

nogbad · Accepted Answer · 2022-04-08 08:11:01Z

0

Using dplyr:

library(dplyr)

dflist <- merge(df, key, by.x = "Username", by.y = "username") %>%
  unique() %>%
  group_by(training) %>%
  group_split()

edit: You can extract the individual list elements like so:

dfzero <- dflist[[1]]
dfone <- dflist[[2]]

edited Apr 8, 2022 at 8:11

answered Apr 7, 2022 at 15:43

nogbad

4455 silver badges17 bronze badges

1 Comment

Joe Crozier Over a year ago

I'm sorry for the silly question because i'm sure this pretty much works, but I can't figure out how to then get these data frames out of the list. I've found other answers like here: stackoverflow.com/questions/59169631/… That explain that, but when I use those solutions my data frames dont have column names anymore. Would you mind please taking your answer one step further to actually having separate data frames? I need to ultimately use write_csv to export them

Collectives™ on Stack Overflow

Filter data frame based off two columns in other data frame

2 Answers 2

Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related