Adding row for missing value in data.table

Question

My question is somehow related to Fastest way to add rows for missing values in a data.frame? but a bit tougher I think. And I can't figure out how to adapt this solution to my problem.

Here is what my data.table looks like :

                   ida       idb         value     date
   1:               A         2          26600  2004-12-31
   2:               A         3          19600  2005-03-31
   3:               B         3          18200  2005-06-30
   4:               B         4          1230   2005-09-30
   5:               C         2          8700   2005-12-31

The difference is that every 'ida' has his own dates and there is at least one row where 'ida' appears with each date but not necessarily for all 'idb'. I want to insert every missing ('ida','idb') couple missing with the corresponding date and 0 as a value.

Moreover, there is no periodicity for the dates.

How would you do this ?

Desired output :

                   ida       idb         value     date
   1:               A         2          26600  2004-12-31
   1:               A         2            0    2005-03-31
   2:               A         3          19600  2005-03-31
   2:               A         3            0    2004-12-31
   3:               B         3          18200  2005-06-30
   4:               B         3            0    2005-09-30
   5:               B         4          1230   2005-09-30
   4:               B         4            0    2005-06-30
   6:               C         2          8700   2005-12-31

The order doesn't matter. Every date missing is filled with a 0 value.

use seq with range(date), then turn the Date vector into one column data.table, setkey on both DTs on date field, join both, fill NA with 0. If you manage to solve your problem don't forget to put the code in answer and mark as accepted. — jangorecki
– jangorecki, Commented Mar 25, 2015 at 17:44
I added the desired output, sorry about that. @JanGorecki : I can't use seq since there is no periodicity with the date. — Kevin P
– Kevin P, Commented Mar 25, 2015 at 17:50

eddi · Accepted Answer · 2015-03-25 18:01:23Z

6

You just do the same thing as in your linked question by each ida:

setkey(dt, idb, date)

dt[, .SD[CJ(unique(idb), unique(date))], by = ida][is.na(value), value := 0][]
#   ida idb value       date
#1:   A   2 26600 2004-12-31
#2:   A   2     0 2005-03-31
#3:   A   3     0 2004-12-31
#4:   A   3 19600 2005-03-31
#5:   C   2  8700 2005-12-31
#6:   B   3 18200 2005-06-30
#7:   B   3     0 2005-09-30
#8:   B   4     0 2005-06-30
#9:   B   4  1230 2005-09-30

answered Mar 25, 2015 at 18:01

eddi

49.5k6 gold badges109 silver badges157 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Adding row for missing value in data.table

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related