Here is the dataframe that I am working with:
df <- tribble(
~Patient, ~date, ~Doctor
"A", "2020-01-01", "A",
"A", "2020-03-01", "A",
"A", "2020-04-30", "B",
"A", "2020-06-29", "C",
"A", "2020-08-28", "A",
"B", "2020-01-01", "A",
"B", "2020-03-01","B",
"B", "2020-04-30","B",
"B", "2020-06-29","B",
"B", "2020-08-28","C",
"C", "2020-04-30","A",
"C", "2020-06-29","A",
"C", "2020-08-28","B",
"C", "2020-10-27","C",
"C", "2020-12-26","A",
)
As you can see, there are three columns: Patient, date, and Doctor.
Here is the desired dataframe that I am working towards.
desired_df <- tribble(
~Patient, ~Number_of_Diff_Doctors_within_180_days,
"A", "3",
"B", "2",
"C", "3",
)
Here is the logic: I'm trying to return a dataframe with one unique value for each patient and the number of doctors that that patient has seen in a 180-day window. This 180-day period is like a moving window, and the job is to figure out the maximum number of doctors seen during any 180-day window for the patient.
In the example, Patient A has three different doctors, doctors A, B, and C, within 2020-03-01 to 2020-06-29, which is <180 window, so this patient gets a code for 1 corresponding to three doctors. But Patient B, who also has three doctors, sees Doctor A on 2020-01-01 and Doctor C on day 2020-08-28, so only has two doctors in any 180-day window. And Patient C is the same as Patient A in the intervals, except the days are shifted forward.
Here is my attempt so far. It doesn't do anything about the date logic because I didn't know what I was doing with all that.
attempt <- df %>%
dplyr::select(Patient, Doctor) %>%
dplyr::group_by(Patient, Doctor) %>%
distinct() %>%
dplyr::group_by(Patient) %>%
tally() %>%
filter(n > 1)