Count number of unique values in two columns by group

Question

I have a data frame with IDs for web page ('Webpage'), department ('Dept') and employee ('Emp_ID'):

df <- data.frame(Webpage = c(111, 111, 111, 111, 222, 222), 
                 Dept = c(101, 101, 101, 102, 102, 103), 
                 Emp_ID = c(1, 1, 2, 3, 4, 4)) 

#   Webpage Dept Emp_ID
# 1     111  101      1
# 2     111  101      1
# 3     111  101      2
# 4     111  102      3
# 5     222  102      4
# 6     222  103      4

I want to know how many unique individual has seen the different webpages.

For e.g. in the following dataset webpage 111 has been seen by three individual (unique combination of Dept and emp ID). So webpage 111 has been seen by emp_ID 1,2 and 3 in Dept 101 and 102. Similarly webpage 222 has been seen by two different individual.

My first attempt is:

nrow(unique(data[ , c("Dept", "Emp_ID)]))

Using unique I can do for one web page, but can someone please suggest how I can calculate this for all web pages

Yuriy Saraykin · Accepted Answer · 2021-03-30 08:20:50Z

2

df <- data.frame(Webpage = c(111, 111, 111, 111, 222, 222), 
                 Dept = c(101, 101, 101, 102, 102, 103), 
                 Emp_Id = c(1, 1, 2, 3, 4, 4))
library(dplyr)

df %>% 
  group_by(Webpage) %>% 
  summarise(n = n_distinct(Dept, Emp_Id))
#> # A tibble: 2 x 2
#>   Webpage     n
#>     <dbl> <int>
#> 1     111     3
#> 2     222     2

library(data.table)
setDT(df)[, list(n = uniqueN(paste0(Dept, Emp_Id))), by = Webpage]
#>    Webpage n
#> 1:     111 3
#> 2:     222 2

^{Created on 2021-03-30 by the reprex package (v1.0.0)}

answered Mar 30, 2021 at 8:20

Yuriy Saraykin

8,9501 gold badge11 silver badges16 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Henrik Over a year ago

An alternative data.table solution would be df[ , .(n = uniqueN(.SD)), by = Webpage]

Henrik Over a year ago

Or more explicit about which columns to include in .SD: df[ , .(n = uniqueN(.SD)), by = Webpage, .SDcols = c("Dept", "Emp_Id")], if there are additional columns which should not be considered in the calculation.

Ronak Shah · Accepted Answer · 2021-03-30 08:10:23Z

2

For each Webpage count unique number based on two columns using duplicated.

library(dplyr)

df %>%
  group_by(Webpage) %>%
  summarise(n_viewers = sum(!duplicated(cur_data())))

#  Webpage n_viewers
#    <dbl>     <int>
#1     111         3
#2     222         2

data

Provide data in a reproducible format which is easier to copy rather than an image.

df <- data.frame(Webpage = c(111, 111, 111, 111, 222, 222), 
                 Dept = c(101, 101, 101, 102, 102, 103), 
                 Emp_Id = c(1, 1, 2, 3, 4, 4))

answered Mar 30, 2021 at 8:10

Ronak Shah

391k20 gold badges173 silver badges237 bronze badges

Comments

ThomasIsCoding · Accepted Answer · 2021-03-30 08:17:13Z

0

Hope aggregate can help

> aggregate(cbind(n_viewer = Emp_Id) ~ Webpage, unique(df), length)
  Webpage n_viewer
1     111        3
2     222        2

answered Mar 30, 2021 at 8:17

ThomasIsCoding

106k9 gold badges38 silver badges110 bronze badges

Collectives™ on Stack Overflow

Count number of unique values in two columns by group

3 Answers 3

2 Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related