4

My question is somehow related to Fastest way to add rows for missing values in a data.frame? but a bit tougher I think. And I can't figure out how to adapt this solution to my problem.

Here is what my data.table looks like :

                   ida       idb         value     date
   1:               A         2          26600  2004-12-31
   2:               A         3          19600  2005-03-31
   3:               B         3          18200  2005-06-30
   4:               B         4          1230   2005-09-30
   5:               C         2          8700   2005-12-31

The difference is that every 'ida' has his own dates and there is at least one row where 'ida' appears with each date but not necessarily for all 'idb'. I want to insert every missing ('ida','idb') couple missing with the corresponding date and 0 as a value.

Moreover, there is no periodicity for the dates.

How would you do this ?

Desired output :

                   ida       idb         value     date
   1:               A         2          26600  2004-12-31
   1:               A         2            0    2005-03-31
   2:               A         3          19600  2005-03-31
   2:               A         3            0    2004-12-31
   3:               B         3          18200  2005-06-30
   4:               B         3            0    2005-09-30
   5:               B         4          1230   2005-09-30
   4:               B         4            0    2005-06-30
   6:               C         2          8700   2005-12-31

The order doesn't matter. Every date missing is filled with a 0 value.

4
  • So what's the desired output for this sample input? Commented Mar 25, 2015 at 17:34
  • does your example include one missing couple? Commented Mar 25, 2015 at 17:35
  • use seq with range(date), then turn the Date vector into one column data.table, setkey on both DTs on date field, join both, fill NA with 0. If you manage to solve your problem don't forget to put the code in answer and mark as accepted. Commented Mar 25, 2015 at 17:44
  • I added the desired output, sorry about that. @JanGorecki : I can't use seq since there is no periodicity with the date. Commented Mar 25, 2015 at 17:50

1 Answer 1

6

You just do the same thing as in your linked question by each ida:

setkey(dt, idb, date)

dt[, .SD[CJ(unique(idb), unique(date))], by = ida][is.na(value), value := 0][]
#   ida idb value       date
#1:   A   2 26600 2004-12-31
#2:   A   2     0 2005-03-31
#3:   A   3     0 2004-12-31
#4:   A   3 19600 2005-03-31
#5:   C   2  8700 2005-12-31
#6:   B   3 18200 2005-06-30
#7:   B   3     0 2005-09-30
#8:   B   4     0 2005-06-30
#9:   B   4  1230 2005-09-30
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.