How do I count the number of distinct values of a column based on another column criterion?

Question

Let's say I have a dataframe that looks like:

owner     unit_id   detail
abc123    002NH034  94847DT
abc123    002NH034  94868DT
abc123    002NH034  94889DT
abc123    112NH035  94899DT
abc123    112NH036
abc123    112NH037

I'm trying to roll up to the level of the first column, while counting the number of distinct values in the second column that have a value in the third column. So expected output would be:

abc123  2

I've tried a mix of sum, is.na, and a few other dplyr functions but am a little stuck. I've tried a variation of the following with random other functions but keep getting sums of all the rows.

df %>%  group_by(owner) %>% summarise(unit_ids_with_detail = sum(!is.na(detail))

Juan C · Accepted Answer · 2022-12-27 20:25:49Z

1

This should do:

df %>% filter(!is.na(detail)) %>%  group_by(owner) %>% 
summarise(unit_ids_with_detail = n_distinct(unit_id))

Remove NAs from third column, group by first column, count distinct values on second column. This will drop owners that have only NAs on the third column, though

edited Dec 27, 2022 at 20:25

answered Dec 27, 2022 at 20:22

Juan C

6,1484 gold badges27 silver badges64 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

zephryl Over a year ago

Nice, +1. You could also use tidyr::drop_na(detail) as a slight shortcut for dplyr::filter(!is.na(detail)).

simplycoding Over a year ago

Maybe I should've mentioned that I can't do filtering, since I'm already doing n_distinct(unit_id) to get the number of all unit_ids

akrun · Accepted Answer · 2022-12-27 20:23:09Z

0

If we want the distinct elements, use n_distinct - Grouped by 'owner', use n_distinct on the 'unit_id' and specify na.rm = TRUE to remove the NA elements

library(dplyr)
df %>%  
   group_by(owner) %>% 
   summarise(unit_ids_with_detail = n_distinct(detail, na.rm = TRUE))

answered Dec 27, 2022 at 20:23

akrun

891k38 gold badges590 silver badges700 bronze badges

Collectives™ on Stack Overflow

How do I count the number of distinct values of a column based on another column criterion?

2 Answers 2

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related