0

This is a follow up question to Kikapp's answer.

I want to remove participant IDs which lack all the time-points. In other way around, I want to select rows which have all the four time (11, 21, 31, 41). See the sample data dropbox link

Here is my try based on Kikapp's answer. For some reason, it doesn't work. Let me know how to make it better.

data2 <- df[df$ID %in% names(table(df$ID))[table(df$ID) > 3],] 

I get 4695 rows or objects or IDs for time == 11, time == 21,time == 41 while 4693 for time == 31; however, I want they should be equal.

6
  • Try: do.call(rbind,Filter(function(x) { length(unique(x[,2])) == 4 },split(df, df$ID))). Commented Sep 30, 2016 at 18:38
  • 1
    or df %>% group_by(ID) %>% dplyr::filter(length(unique(time)) == 4) %>% data.frame() with dplyr. Commented Sep 30, 2016 at 18:45
  • @Abdou - Thanks! First code did not work. Second gives same result as my data2 code. I get two less rows with time==31. Actually all four time-points (11,21,31,41) should have same number of IDs or rows or objects. With data2 <- df[df$ID %in% names(table(df$ID))[table(df$ID) > 3],] code or yours df %>% group_by(ID) %>% dplyr::filter(length(unique(time)) == 4) %>% data.frame() code, I get 4695 rows or objects or IDs for 11, 21,41 while 4693 for 31 time. Commented Sep 30, 2016 at 19:01
  • 1
    Both the code snippets I provided do the same exact thing, so I am not sure what you mean by "First code did not work". It looks like you have 2 rows in your data that have time values of 32. You did not mention that there are rows with values of 32 for time. Commented Sep 30, 2016 at 19:15
  • 1
    I will write up an answer to explain how I found that there were rows with 32. Commented Sep 30, 2016 at 19:33

1 Answer 1

1

You can use dplyr for this task for a much faster result:

df1 <- df %>% group_by(ID) %>% 
    dplyr::filter(length(unique(time)) == 4) %>% 
    data.frame()

However, when you get the counts of ID's for each time value you will find out that there are 32's hidden there (2 rows in total):

df1 %>% group_by(time) %>% 
    dplyr::summarise(Counts = n()) %>% 
    data.frame()

#Output:
time Counts
 11   4695  
 21   4695  
 31   4693  
 32      2  
 41   4695 

This shows that you have 2 rows with values of 32. As it turns out, that was due to a typo on your part. So you can change them with df$time[df$time == 32] <- 31 and run the code again.

I hope this was helpful.

Thanks!

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.