6

I have a data.table in R which has several ids and a value. For each combination of ids, there are several rows. If one of these rows contains NA in the column 'value', I would like to remove all rows with this combination of ids. For example, in the table below, I would like to remove all rows for which id1 == 2 and id2 == 1.

If I had only one id I would do dat[!(id1 %in% dat[is.na(value),id1])]. In the example, this would remove all rows where i1 == 2. However, I did not manage to include several columns.

dat <- data.table(id1 = c(1,1,2,2,2,2),
                  id2 = c(1,2,1,2,3,1),
                  value = c(5,3,NA,6,7,3))
5
  • 2
    Try dat[!(id1==2 & id2==1)] or setkey(dat, id1, id2)[!J(2,1) ] Commented Jan 17, 2015 at 17:45
  • I know that this would work in the simple example above. However, the question is meant to be more general as there might be a large number of rows with NAs. Commented Jan 17, 2015 at 17:49
  • 1
    I think he is looking for dat[, if(all(!is.na(value))) .SD, .(id1, id2)] Commented Jan 17, 2015 at 17:49
  • @lilaf Okay, just now read the part about the NA. My comment was based on I would like to remove all rows for which id1 == 2 and id2 == 1. Commented Jan 17, 2015 at 17:54
  • 1
    @akrun Anyway thank you for your answer, I'll try to be clearer next time. Commented Jan 17, 2015 at 18:01

1 Answer 1

4

If you want to check per combination of id1 and id2 if any of the values are NAs and then remove that whole combination, you can insert an if statement per group and only retrieve the results (using .SD) if that statement returns TRUE.

dat[, if(!anyNA(value)) .SD, by = .(id1, id2)]
#    id1 id2 value
# 1:   1   1     5
# 2:   1   2     3
# 3:   2   2     6
# 4:   2   3     7

Or similarly,

dat[, if(all(!is.na(value))) .SD, by = .(id1, id2)]
Sign up to request clarification or add additional context in comments.

3 Comments

It might be costly to split dat into all those .SD and stack them. An alternative (maybe generally faster?) approach would be to select rows to keep dat[dat[,!any(is.na(value)),by="id1,id2"]$V1]
Ah, you're right. I did test it, but somehow convinced myself that what I saw was the right answer. The alternative I should have mentioned is: dat[dat[,.I[!any(is.na(value))],by="id1,id2"]$V1]
@Frank that's a nice option too.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.