How to remove rows based multiple conditions

Question

I want to delete the row based on conditions specified in two different columns, for each group. in my case, I want to remove "Death" which occurs in the first admission, but keep "Death" when it occurs in the readmission, for each patient's id

here is the initial data.frame :

ConditionI <- c("2017-01-01", "2018-01-01", "2018-01-15", "2018-01-20", "2018-02-01", "2018-02-1", "2018-03-01", "2018-04-01","2018-04-10")

ConditionII <- c("Death", "Alive", "Alive", "Death", "Alive", "Alive", "Death", "Alive", "Death")

id <- c("A","B","B","B","C","C","D","E","E")

df <- data.frame(id,ConditionI,ConditionII

my goal is :

ConditionII <- c( "Alive", "Alive", "Death", "Alive", "Alive", "Alive", "Death")
ConditionI <- c( "2018-01-01", "2018-01-15", "2018-01-20", "2018-02-01", "2018-02-1", "2018-04-01","2018-04-10")
id <- c("B","B","B","C","C","E","E")

df <- data.frame(id,ConditionI,ConditionII)

I thought this was a very basic question, but I tried several times and didn't get the answer. your help is very much appreciated. thanks in advance!

Ronak Shah · Accepted Answer · 2020-06-28 01:23:59Z

1

You can remove rows where 'Death' occurs on row number 1 in each group.

library(dplyr)

df %>%
  group_by(id) %>%
  filter(!(row_number() == 1 & ConditionII == 'Death'))

#  id    ConditionI ConditionII
#  <chr> <chr>      <chr>      
#1 B     2018-01-01 Alive      
#2 B     2018-01-15 Alive      
#3 B     2018-01-20 Death      
#4 C     2018-02-01 Alive      
#5 C     2018-02-1  Alive      
#6 E     2018-04-01 Alive      
#7 E     2018-04-10 Death

Same logic using data.table :

library(data.table)
setDT(df)[, .SD[!(seq_len(.N) == 1 & ConditionII == 'Death')], id]

answered Jun 28, 2020 at 1:23

Ronak Shah

391k20 gold badges173 silver badges237 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

akrun · Accepted Answer · 2020-06-28 03:12:27Z

1

We can use subset with duplicated from base R directly

subset(df,  !id %in% id[!duplicated(id) & ConditionII == 'Death'])
#   id ConditionI ConditionII
#2  B 2018-01-01       Alive
#3  B 2018-01-15       Alive
#4  B 2018-01-20       Death
#5  C 2018-02-01       Alive
#6  C  2018-02-1       Alive
#8  E 2018-04-01       Alive
#9  E 2018-04-10       Death

Or with dplyr

library(dplyr)
df %>%
    filter( !id %in% id[!duplicated(id) & ConditionII == 'Death'])

edited Jun 28, 2020 at 3:12

answered Jun 28, 2020 at 2:54

akrun

891k38 gold badges590 silver badges700 bronze badges

Collectives™ on Stack Overflow

How to remove rows based multiple conditions

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related