R remove duplicates based on other columns

Question

I want to remove duplicates based on similarities or differences of other columns.

All the duplicated ID should be completely removed but just if they have DIFFERENT colours. It doesn't matter if they have different subgroups as well. If they have the same ID AND the same colour, just the first one should be kept.

At the end I want to have a list of all ID which are single-colour only (independent of subgroup). All the multicoloured ID should be removed.

Here and example:

   id colour   subgroup
1   1    red   lightred
2   2   blue  lightblue
3   2   blue   darkblue
4   3    red   lightred
5   4    red    darkred
6   4    red    darkred
7   4   blue  lightblue
8   5  green  darkgreen
9   5  green  darkgreen
10  5  green lightgreen
11  6    red    darkred
12  6   blue   darkblue
13  6  green lightgreen

At the end it should look like this:

  id colour  subgroup
1  1    red  lightred
2  2   blue lightblue
4  3    red  lightred
8  5  green darkgreen

The data I used for this example:

id = c(1,2,2,3,4,4,4,5,5,5,6,6,6)
colour = c("red","blue","blue","red","red","red","blue","green","green","green","red","blue","green")
subgroup = c("lightred","lightblue","darkblue","lightred","darkred","darkred","lightblue","darkgreen","darkgreen","lightgreen","darkred","darkblue","lightgreen")
data = data.frame(cbind(id,colour,subgroup))

Thanks for your help!

Onyambu · Accepted Answer · 2018-06-29 06:25:34Z

2

library(tidyverse)
data%>%
  group_by(id)%>%
  filter(1==length(unique(colour)),!duplicated(colour))
# A tibble: 4 x 3
# Groups:   id [4]
  id    colour subgroup 
  <fct> <fct>  <fct>    
1 1     red    lightred 
2 2     blue   lightblue
3 3     red    lightred 
4 5     green  darkgreen

Using Base R:

 subset(data,as.logical(ave(colour,id,FUN=function(x)length(unique(x))==1& !duplicated(x))))
  id colour  subgroup
1  1    red  lightred
2  2   blue lightblue
4  3    red  lightred
8  5  green darkgreen

edited Jun 29, 2018 at 6:25

answered Jun 29, 2018 at 6:19

Onyambu

80.3k3 gold badges29 silver badges65 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

hannes101 · Accepted Answer · 2018-06-29 07:22:20Z

0

I got a small data.table solution. It first filters on all non-duplicated id, colour combinations and then selects all combinations, where only one id, colour combination exists.

library(data.table)
dt.data <- data.table(data)
dt.data[!duplicated(dt.data, by = c("id", "colour"))
                       ,.(colour, subgroup, .N)
                       , by = list(id)][N==1, .(id
                                               , colour
                                               , subgroup)]

answered Jun 29, 2018 at 7:22

hannes101

2,5781 gold badge20 silver badges44 bronze badges

Collectives™ on Stack Overflow

R remove duplicates based on other columns

2 Answers 2

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related