0

I have the tables below:

1st Table:

ID Rejected New Expired
001 2021-02-21 2022-03-20 2021-05-20
001 2021-02-21 2022-03-20 2021-05-20
002 2021-06-21 NA 2021-06-20
002 2021-06-21 NA 2021-06-20
003 2021-05-20 NA 2021-05-20
003 2021-05-20 NA 2021-05-20
004 2021-05-20 2021-11-03 2022-06-20
004 2021-05-20 2021-11-03 2022-06-20
005 2021-05-20 2021-11-03 2022-06-20
005 2021-05-20 2021-11-03 2022-06-20

2nd Table:

ID date
001 2021-04-30
002 2021-04-30
003 2021-04-30
004 2021-04-30
005 2021-04-30

Desired Output:

ID Rejected New Expired
001 2021-02-21 2021-02-21 2021-05-20
001 2021-02-21 2021-02-21 2021-05-20
002 2021-06-21 2021-04-30 2021-06-21
002 2021-06-21 2021-04-30 2021-06-21
003 2021-05-20 2021-04-30 2021-05-20
003 2021-05-20 2021-04-30 2021-05-20
004 2021-03-20 2021-03-20 2022-06-20
004 2021-03-20 2021-03-20 2022-06-20
005 2021-05-20 2021-05-20 2022-06-20
005 2021-05-20 2021-05-20 2022-06-20

What I want:

  1. Merge table 1 and 2 by ID only for values where table1$new is not NA. (I.e. all NA values in table1 should be filled with date values from table2)
  2. After merging, merge$new dates cannot occur after rejected or expired. One solution could be finding the minimum value in each row and placing that in New.

My Code:

table2 <- q1 %>% ##Create new dataset min_val from q1##
  group_by(ID) %>%
  slice(which.min(date)) %>% ##find min value from each row##
  rename(New2 = date)   ## rename the createdatetime to New2##

merged <- table1 %>% #merge merged_final.1 to min_val##
  left_join(table2, by = 'ID') %>% 
  mutate(New = coalesce(New, New2)) %>% ##This will make sure only NA value are replaced
  select(-New2) ##drop New2 column##

merged$New <- as.Date(apply(merged[, c(2, 3, 4)], 1, FUN = min))

Issue

This last line of code does not seem to be working for me. When I run this, many of the merged$new values turn to NA and the previously NA rows of merge$rejected and merge$expired are suddenly filled with random dates.

Any help would be appreciated. Also not sure why my third table isn't showing up in html format.

5
  • @onyambu I am not sure what this means. Could you please clarify? Commented Jun 14, 2022 at 19:16
  • Sorry. I thought it was python. It turns out to be R Commented Jun 14, 2022 at 19:21
  • Check the response below Commented Jun 14, 2022 at 19:24
  • Your starting data is different with regards to Rejected date on ID2 as compared to the one in the desired output Commented Jun 14, 2022 at 19:25
  • Apologies! Will fix now Commented Jun 14, 2022 at 19:50

1 Answer 1

1

You can use left_join then coalesce the two columns

library(tidyverse)

left_join(df1, df2) %>%
    mutate(New = pmin(Rejected, coalesce(New, date), Expired), date = NULL)

   ID   Rejected        New    Expired
1   1 2021-02-21 2021-02-21 2021-05-20
2   1 2021-02-21 2021-02-21 2021-05-20
3   2 2021-03-21 2021-03-21 2021-05-20
4   2 2021-03-21 2021-03-21 2021-05-20
5   3 2021-05-20 2021-04-30 2021-05-20
6   3 2021-05-20 2021-04-30 2021-05-20
7   4 2021-05-20 2021-05-20 2022-06-20
8   4 2021-05-20 2021-05-20 2022-06-20
9   5 2021-05-20 2021-05-20 2022-06-20
10  5 2021-05-20 2021-05-20 2022-06-20
Sign up to request clarification or add additional context in comments.

11 Comments

Row 3 and 4 of New are not as the desired output. Maybe you can check?
@TarJae that is because Row 3 and 4 of Rejected in the original data is different from the Row 3 and 4 rejected of desired data
Hi, this worked! Could you explain to me how to interpret this part of code: (New = pmin(Rejected, coalesce(New, date), Expired), date = NULL) Is the interpretation: New will equal the minimum value between rejected, new, and expired?
@JohnTayson yes. New is the minimum of the three. Then get rid of date.
Last question, is there a reason to do a left join rather than a merge?
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.