1

I want to delete the row based on conditions specified in two different columns, for each group. in my case, I want to remove "Death" which occurs in the first admission, but keep "Death" when it occurs in the readmission, for each patient's id

here is the initial data.frame :

ConditionI <- c("2017-01-01", "2018-01-01", "2018-01-15", "2018-01-20", "2018-02-01", "2018-02-1", "2018-03-01", "2018-04-01","2018-04-10")

ConditionII <- c("Death", "Alive", "Alive", "Death", "Alive", "Alive", "Death", "Alive", "Death")

id <- c("A","B","B","B","C","C","D","E","E")

df <- data.frame(id,ConditionI,ConditionII

my goal is :

ConditionII <- c( "Alive", "Alive", "Death", "Alive", "Alive", "Alive", "Death")
ConditionI <- c( "2018-01-01", "2018-01-15", "2018-01-20", "2018-02-01", "2018-02-1", "2018-04-01","2018-04-10")
id <- c("B","B","B","C","C","E","E")

df <- data.frame(id,ConditionI,ConditionII)

I thought this was a very basic question, but I tried several times and didn't get the answer. your help is very much appreciated. thanks in advance!

2 Answers 2

1

You can remove rows where 'Death' occurs on row number 1 in each group.

library(dplyr)

df %>%
  group_by(id) %>%
  filter(!(row_number() == 1 & ConditionII == 'Death'))

#  id    ConditionI ConditionII
#  <chr> <chr>      <chr>      
#1 B     2018-01-01 Alive      
#2 B     2018-01-15 Alive      
#3 B     2018-01-20 Death      
#4 C     2018-02-01 Alive      
#5 C     2018-02-1  Alive      
#6 E     2018-04-01 Alive      
#7 E     2018-04-10 Death     

Same logic using data.table :

library(data.table)
setDT(df)[, .SD[!(seq_len(.N) == 1 & ConditionII == 'Death')], id]
Sign up to request clarification or add additional context in comments.

Comments

1

We can use subset with duplicated from base R directly

subset(df,  !id %in% id[!duplicated(id) & ConditionII == 'Death'])
#   id ConditionI ConditionII
#2  B 2018-01-01       Alive
#3  B 2018-01-15       Alive
#4  B 2018-01-20       Death
#5  C 2018-02-01       Alive
#6  C  2018-02-1       Alive
#8  E 2018-04-01       Alive
#9  E 2018-04-10       Death

Or with dplyr

library(dplyr)
df %>%
    filter( !id %in% id[!duplicated(id) & ConditionII == 'Death'])

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.