Delete duplicate rows based on condition in another column

Question

Let's say I have this data frame:

df <- data.frame(
  a = c(NA,6,6,8),
  x= c(1,2,2,4),
  y = c(NA,2,NA,NA),
  z = c("apple", 2, "2", NA), 
  d = c(NA, 5, 5, 5),stringsAsFactors = FALSE)

Rows 2 and 3 are duplicates and row 3 has an NA value. I want to delete the duplicate row with the NA value so that it looks like this:

df <- data.frame(
  a = c(NA,6,8),
  x= c(1,2,4),
  y = c(NA,2,NA),
  z = c("apple", 2, NA), 
  d = c(NA, 5, 5),stringsAsFactors = FALSE)

I tried this but it doesn't work:

  
df2 <- df %>% group_by (a,x,z,d) %>% filter(y == max(y))

Any suggestions?

Onyambu · Accepted Answer · 2021-06-22 02:42:50Z

1

df %>%
   arrange_all() %>%
   filter(!duplicated(fill(., everything())))
   a x  y     z  d
1 NA 1 NA apple NA
2  6 2  2     2  5
3  8 4 NA  <NA>  5

edited Jun 22, 2021 at 2:42

answered Jun 22, 2021 at 2:33

Onyambu

80.3k3 gold badges29 silver badges65 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

foreach · Accepted Answer · 2021-06-22 02:20:43Z

0

df %>% arrange(a,x,z,d) %>% distinct(a,x,z,d,.keep_all=TRUE)

   a x  y     z  d
1  6 2  2     2  5
2  8 4 NA  <NA>  5
3 NA 1 NA apple NA

answered Jun 22, 2021 at 2:20

foreach

992 bronze badges

1 Comment

Gerhard Over a year ago

While this code snippet may solve the question, including an explanation really helps to improve the quality of your post. Remember that you are answering the question for readers in the future, and those people might not know the reasons for your code suggestion.

Ronak Shah · Accepted Answer · 2021-06-22 03:42:01Z

0

Fill NA values with previous non-NA and select unique rows with distinct.

library(dplyr)
library(tidyr)

df %>% fill(everything()) %>% distinct()

#   a x  y     z  d
#1 NA 1 NA apple NA
#2  6 2  2     2  5
#3  8 4 NA  <NA>  5

answered Jun 22, 2021 at 3:42

Ronak Shah

391k20 gold badges173 silver badges237 bronze badges

2 Comments

karuno Over a year ago

Thank you! is there a way to do it without fill and distinct? Something with group_by or filter?

Ronak Shah Over a year ago

group_by would not work directly since NA could be in any column and not only y and they have different values. You can use df %>% fill(everything()) %>% group_by(across()) %>% slice(1L)

Collectives™ on Stack Overflow

Delete duplicate rows based on condition in another column

3 Answers 3

Comments

1 Comment

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

1 Comment

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related