0

I have the following sample dataframe in R:

Test <- data.frame("Individual"=c("John", "John", "Alice", "Alice", "Alice", "Eve", "Eve","Eve","Jack"), "ExamNumber"=c("Test1", "Test2", "Test1", "Test2", "Test3", "Test1", "Test2", "Test3",  "Test3"))

Which Gives:

  Individual ExamNumber
1       John      Test1
2       John      Test2
3      Alice      Test1
4      Alice      Test2
5      Alice      Test3
6        Eve      Test1
7        Eve      Test2
8        Eve      Test3
9       Jack      Test3

However I want to remove any Individual who does not have all three test to result in:

  Individual ExamNumber
1      Alice      Test1
2      Alice      Test2
3      Alice      Test3
4        Eve      Test1
5        Eve      Test2
6        Eve      Test3

3 Answers 3

3

Here is another way using dplyr to check whether all three tests exist within groups:

library(dplyr)
Test %>% 
  group_by(Individual) %>%
  filter(all(c("Test1", "Test2", "Test3") %in% ExamNumber)) %>%
  ungroup()

# A tibble: 6 × 2
  Individual ExamNumber
      <fctr>     <fctr>
1      Alice      Test1
2      Alice      Test2
3      Alice      Test3
4        Eve      Test1
5        Eve      Test2
6        Eve      Test3
Sign up to request clarification or add additional context in comments.

Comments

2

You can use ave to group by Individual and check if the count for each group is 3 using NROW

Test[ave(1:nrow(Test), Test$Individual, FUN = NROW)==3,]
#  Individual ExamNumber
#3      Alice      Test1
#4      Alice      Test2
#5      Alice      Test3
#6        Eve      Test1
#7        Eve      Test2
#8        Eve      Test3

And here is a slightly more robust approach using same idea but with split

Test[order(Test$Individual),][unlist(lapply(split(Test, Test$Individual), function(a)
          rep(all(unique(Test$ExamNumber) %in% a$ExamNumber), NROW(a)))),]

Comments

2

Using base R

ind_eq3 <- names( which( with( Test, by( Test, 
                                         INDICES = list(Individual), 
                                         FUN = function(x) length(unique(x$ExamNumber)) == 3) ) ) )
with(Test, Test[ Individual %in% ind_eq3, ] )

#   Individual ExamNumber
# 3      Alice      Test1
# 4      Alice      Test2
# 5      Alice      Test3
# 6        Eve      Test1
# 7        Eve      Test2
# 8        Eve      Test3

Using data.table

library('data.table')
setDT(Test)[ , 
             j  = .SD[length( unique(ExamNumber) ) == 3, ],
             by = 'Individual']

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.