I'm using R and I have two data sets, one contains the reference date (date of cancer diagnosis) and another contains the dates of the scans. Some patients have had multiple scans pre and post date of diagnosis. I need to get the first scan after the date of diagnosis. I then plan to merge the data sets so that we can analyse the additional data (not described) that is in the data frames.
I am using lubridate, tidyverse, and dplyr.
The structure of the first data set "a1" is:
patient_id diagnosis_date
1 2018-06-26
2 2014-10-15
3 2016-02-19
4 2018-06-30
Structure of second data "a2" set:
patient_id mri_date
1 2018-04-19
1 2018-07-12
1 2018-08-11
2 2014-11-01
3 2016-02-25
3 2018-10-07
I want to select the first scan after the date of diagnosis mri_date>=diagnosis_date for each patient_id. E.g. mri_date 2018-07-12 for patient 1.
I've tried merging the data sets combined<-merge(a1,a2,by="patient_id",all.x=TRUE) and then was planning to filter and slice. However, this deleted the multiple mri_date values for each patient and just took the first one.
I've tried searching for an answer but can't seem to find one.
I would be very grateful for your help.