1

Let's say I have a dataframe that looks like:

owner     unit_id   detail
abc123    002NH034  94847DT
abc123    002NH034  94868DT
abc123    002NH034  94889DT
abc123    112NH035  94899DT
abc123    112NH036
abc123    112NH037  

I'm trying to roll up to the level of the first column, while counting the number of distinct values in the second column that have a value in the third column. So expected output would be:

abc123  2

I've tried a mix of sum, is.na, and a few other dplyr functions but am a little stuck. I've tried a variation of the following with random other functions but keep getting sums of all the rows.

df %>%  group_by(owner) %>% summarise(unit_ids_with_detail = sum(!is.na(detail))

2 Answers 2

1

This should do:

df %>% filter(!is.na(detail)) %>%  group_by(owner) %>% 
summarise(unit_ids_with_detail = n_distinct(unit_id))

Remove NAs from third column, group by first column, count distinct values on second column. This will drop owners that have only NAs on the third column, though

Sign up to request clarification or add additional context in comments.

2 Comments

Nice, +1. You could also use tidyr::drop_na(detail) as a slight shortcut for dplyr::filter(!is.na(detail)).
Maybe I should've mentioned that I can't do filtering, since I'm already doing n_distinct(unit_id) to get the number of all unit_ids
0

If we want the distinct elements, use n_distinct - Grouped by 'owner', use n_distinct on the 'unit_id' and specify na.rm = TRUE to remove the NA elements

library(dplyr)
df %>%  
   group_by(owner) %>% 
   summarise(unit_ids_with_detail = n_distinct(detail, na.rm = TRUE))

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.