Select multiple duplicate rows based on specific values in next column

Question

This is a follow up question to Kikapp's answer.

I want to remove participant IDs which lack all the time-points. In other way around, I want to select rows which have all the four time (11, 21, 31, 41). See the sample data dropbox link

Here is my try based on Kikapp's answer. For some reason, it doesn't work. Let me know how to make it better.

data2 <- df[df$ID %in% names(table(df$ID))[table(df$ID) > 3],]

I get 4695 rows or objects or IDs for time == 11, time == 21,time == 41 while 4693 for time == 31; however, I want they should be equal.

Try: do.call(rbind,Filter(function(x) { length(unique(x[,2])) == 4 },split(df, df$ID))). — Abdou
– Abdou, Commented Sep 30, 2016 at 18:38
or df %>% group_by(ID) %>% dplyr::filter(length(unique(time)) == 4) %>% data.frame() with dplyr. — Abdou
– Abdou, Commented Sep 30, 2016 at 18:45
@Abdou - Thanks! First code did not work. Second gives same result as my data2 code. I get two less rows with time==31. Actually all four time-points (11,21,31,41) should have same number of IDs or rows or objects. With data2 <- df[df$ID %in% names(table(df$ID))[table(df$ID) > 3],] code or yours df %>% group_by(ID) %>% dplyr::filter(length(unique(time)) == 4) %>% data.frame() code, I get 4695 rows or objects or IDs for 11, 21,41 while 4693 for 31 time. — Aby
– Aby, Commented Sep 30, 2016 at 19:01
Both the code snippets I provided do the same exact thing, so I am not sure what you mean by "First code did not work". It looks like you have 2 rows in your data that have time values of 32. You did not mention that there are rows with values of 32 for time. — Abdou
– Abdou, Commented Sep 30, 2016 at 19:15
I will write up an answer to explain how I found that there were rows with 32. — Abdou
– Abdou, Commented Sep 30, 2016 at 19:33

Abdou · Accepted Answer · 2016-09-30 19:40:43Z

1

You can use dplyr for this task for a much faster result:

df1 <- df %>% group_by(ID) %>% 
    dplyr::filter(length(unique(time)) == 4) %>% 
    data.frame()

However, when you get the counts of ID's for each time value you will find out that there are 32's hidden there (2 rows in total):

df1 %>% group_by(time) %>% 
    dplyr::summarise(Counts = n()) %>% 
    data.frame()

#Output:
time Counts
 11   4695  
 21   4695  
 31   4693  
 32      2  
 41   4695

This shows that you have 2 rows with values of 32. As it turns out, that was due to a typo on your part. So you can change them with df$time[df$time == 32] <- 31 and run the code again.

I hope this was helpful.

Thanks!

answered Sep 30, 2016 at 19:40

Abdou

13.3k4 gold badges44 silver badges42 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Select multiple duplicate rows based on specific values in next column

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related