0

So I have data as such

USER    source
A       orange
A       apple
B       banana
B       banana
B       grape
C       grape
D       orange
D       orange

All I want to do is show the source which appeared the most for each USER. BUT in addition, there are instances where there are ties, in which case both winners of the tie should be represented as .50 in a weight variable.

My final result in doing this to the above data should be as follows:

USER    source   weight
A       orange   .5
A       apple    .5
B       banana   1
C       grape    1
D       orange   1

So basically, I only use the weight column because there can be ties that need accounting for..., but I want to have as many 1's as possible.

In R code:

data <- structure(list(USER = c("A", "A", "B", "B", "B", "C", "D", "D"
), source = c("orange", "apple", "banana", "banana", "grape", 
"grape", "orange", "orange")), row.names = c(NA, -8L), class = c("data.table", 
"data.frame"), .internal.selfref = <pointer: 0x000001f74a701ef0>)

0

1 Answer 1

2

Using the Modes function from here you may solve this with dplyr.

library(dplyr)

Modes <- function(x) {
  ux <- unique(x)
  tab <- tabulate(match(x, ux))
  ux[tab == max(tab)]
}

data %>%
  group_by(USER) %>%
  summarise(source = Modes(source)) %>%
  mutate(weight = 1/n()) %>%
  ungroup

#  USER  source weight
#  <chr> <chr>   <dbl>
#1 A     orange    0.5
#2 A     apple     0.5
#3 B     banana    1  
#4 C     grape     1  
#5 D     orange    1  
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.