I have got a dataframe of ~12000 observations with two columns "Code" and "Date". Each code should have 4 observations and therefore 4 dates, but I have got missing values (not NA, but non-existing rows) within the "Date" column.
Here an example of my dataframe:
Station Date
7002 17/12/1966
7002 05/05/1968
7002 30/10/1968
7002 16/08/1970
7003 02/12/1966
7003 05/05/1968
7003 31/10/1968
8004 04/07/1968
8004 15/11/1968
8006 13/10/1966
8006 23/09/1967
8006 01/09/1968
[....]
What I need to do is detect for each code the rows which are missing.
I am using "water years", which start from the 1st October and end on the next 30th September e.g. 01/10/1998 - 30/09/1999. This is the difficult thing, which makes my question different from the other ones similar.
The time period considered ranges from 01/10/1966 to 30/09/1970 (4 water years) and the observations in the column "Date" are already fixed for water years (i.e. one observation per water year).
My output should be like: e.g.
Station Date
7002 17/12/1966
7002 05/05/1968
7002 30/10/1968
7002 16/08/1970
7003 02/12/1966
7003 05/05/1968
7003 31/10/1968
7003 NA
8004 NA
8004 04/07/1968
8004 15/11/1968
8004 NA
8006 13/10/1966
8006 23/09/1967
8006 01/09/1968
8006 NA
[...]
table(unlist(dat$ID))[table(unlist(dat$ID)) < 4]- which will let you know which Stations have less than 4 entries, then justrbind()NA rows for those particular stations.