0

Representative dataset:

ID,drug_start,date_of_bloods,result
1234,NA,12-10-2010,80
1234,NA,12-10-2011,50
1234,NA,12-10-2013,10
4532,05-03-2015,01-01-2013,80
4532,05-03-2015,01-01-2014,60
4532,05-03-2015,01-01-2016,40
7894,04-09-2016,03-08-2012,40
7894,04-09-2016,03-08-2014,38
7894,04-09-2016,02-06-2015,30
7894,04-09-2016,29-10-2016,27
7894,04-09-2016,10-10-2017,26

I would like to filter this to result >20 and those rows where the date_of_bloods is < drug_start within each row. Where NA is in the drug_start column I would like them just filtered on result >20. Desired outcome would be:

ID,drug_start,date_of_bloods,result
1234,NA,12-10-2010,80
1234,NA,12-10-2011,50
4532,05-03-2015,01-01-2013,80
4532,05-03-2015,01-01-2014,60
7894,04-09-2016,03-08-2012,40
7894,04-09-2016,03-08-2014,38
7894,04-09-2016,02-06-2015,30

So far I have tried :

table <- readxl::read_excel("data.xlsx", sheet = "bloods")
table <- as.data.frame(table)
table$result <- as.numeric(table$result)
table<- subset(table, result >20)
table$date_of_bloods <- as.Date(table$date_of_bloods, format="%d/%m/%Y")
table$drug_start <- as.Date(table$drug_start, format="%d/%m/%Y")
table[!(table$drug_start >= table$date_of_bloods),]

The final line does not work and just turns all the rows where drug_start is NA to all columns reading NA and the dates not having filtered.

Your help would be greatly welcomed

1
  • Please provide minimal and reproducible example(s). Use dput() for data... Commented Jul 2, 2020 at 8:46

1 Answer 1

2

You can convert the dates to actual and date and use subset to select rows where result is greater than 20 and date_of_bloods < drug_start or drug_start is NA.

table$date_of_bloods <- as.Date(table$date_of_bloods, format="%d-%m-%Y")
table$drug_start <- as.Date(table$drug_start, format="%d-%m-%Y")

subset(table, result > 20 & (date_of_bloods < drug_start | is.na(drug_start)))

#    ID drug_start date_of_bloods result
#1 1234       <NA>     2010-10-12     80
#2 1234       <NA>     2011-10-12     50
#4 4532 2015-03-05     2013-01-01     80
#5 4532 2015-03-05     2014-01-01     60
#7 7894 2016-09-04     2012-08-03     40
#8 7894 2016-09-04     2014-08-03     38
#9 7894 2016-09-04     2015-06-02     30

The same logic using dplyr :

library(dplyr)

table %>%
  mutate(across(c(drug_start, date_of_bloods), lubridate::dmy)) %>%
  filter(result > 20 & (date_of_bloods < drug_start | is.na(drug_start)))
Sign up to request clarification or add additional context in comments.

5 Comments

thanks Ronak that worked perfectly. To clarify for my understanding when I was using as.Date in my attempt I should have been using '-' instead of '/'? Further to that my filtering didn't work as I was wasn't using subset?
In format of as.Date you should specify what format you have. In your data you have format "%d-%m-%Y" with "-" bu you specified %d/%m/%Y" with "/" which turned all the dates to NA and so your next line did not work since all the dates were NA.
Thank you again for taking the time to help, makes perfect sense! Out of interest would table[!(table$drug_start >= table$date_of_bloods),] then work if dates had been formatted properly?
Yes, it would but then it would not select the first 2 NA rows for that you need to add another condition as mentioned in my answer using is.na.
Thanks again for your help!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.