Adding row for missing value in data.table, under some condition

Question

I'd like to build up on a question I asked earlier (Adding row for missing value in data.table). Actualy I don't need to add so many zero, I only need the first missing one to be 0.

So this time I would like to insert a 0 only if the last entry for the same ('ida', 'idb') is not already 0. We suppose there is no 0 in the original data.table.

So an example would be :

 ida       idb         value     date
  A         3          26600  2004-12-31
  A         3          19600  2005-03-31
  A         3          18200  2005-06-30
  A         4           9560  2004-12-31
  B         1           2600  2004-12-31
  B         1            100  2005-03-31
  B         1           8200  2005-06-30
  B         1           9560  2007-12-31
  B         9           1423  2004-12-31
  B         9           1235  2005-06-30
  C         2           8700  2005-12-31

Gives :

 ida       idb         value     date
  A         3          26600  2004-12-31
  A         3          19600  2005-03-31
  A         3          18200  2005-06-30
  A         4           9560  2004-12-31
  A         4              0  2005-03-31
  B         1           2600  2004-12-31
  B         1            100  2005-03-31
  B         1           8200  2005-06-30
  B         1           9560  2007-12-31
  B         9           1423  2004-12-31
  B         0              0  2005-03-31
  B         9           1235  2005-06-30
  B         0              0  2007-12-31
  C         2           8700  2005-12-31

It's not clear how is this question differs from the previous one. — David Arenburg
– David Arenburg, Commented Mar 25, 2015 at 22:49
So what's the rule here? Why B would have additional two 0 while A will have only one additional? — David Arenburg
– David Arenburg, Commented Mar 25, 2015 at 22:57
There is only one 0 for A because if we added a second one (at the date 2005-06-30) there would be two consecutive 0. B has two additional 0 because the first one is separated by 2005-06-30 from the second one. The rule is : insert a 0 only if no 0 was inserted right before. — Kevin P
– Kevin P, Commented Mar 25, 2015 at 23:01

Kevin P · Accepted Answer · 2015-03-26 10:09:53Z

1

I found a way to do it. So we take the answer to my previous question :

setkey(dt, idb, date)
dt[, .SD[CJ(unique(idb), unique(date))], by = ida][is.na(value), value := 0]

And we add some more steps :

setkey(dt, idb, date)
dt[, .SD[CJ(unique(idb), unique(date))][is.na(value), value := 0][prev.value := c(0, value[-.N])][value > 0 | (value == 0 & prev.value > 0), ][, prev.value := NULL], by = ida]

So for each subset we replace NAs by 0s, and then add a new column corresponding to the previous values. Finally we keep only the rows != 0 and the first 0, and we delete the prev.value column.

answered Mar 26, 2015 at 10:09

Kevin P

2831 gold badge4 silver badges13 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Adding row for missing value in data.table, under some condition

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related