When I have to add missing observations I usually use CJ in data.table after setting the keys.
In the following example, two columns are the combinations of each other, and therefore take the same values. I want to add the missing combinations. What is happening when I use "only" l instead of l[1:3]?
Data:
library(data.table)
l <- letters[1:3]
DT <- CJ(from = l, to = l)
DT <- DT[, .(year = 1994:1995), .(from, to)]
DT[, g := runif(.N)]
# drop some rows
DT <- DT[-c(2, 6)]
# set the keys
setkey(DT, from, to, year)
head(DT)
# from to year g
# 1: a a 1994 0.6436420
# 2: a b 1994 0.6974629
# 3: a b 1995 0.5686354
# 4: a c 1994 0.6436957
# 5: b a 1994 0.6219831
# 6: b a 1995 0.7191914
# unexpected result:
DT[CJ(l,l,1994:1995)]
# from to year g l
# 1: a a 1994 0.6436420 a
# 2: a a 1995 0.6436420 a
# 3: a a 1994 0.6436420 b
# 4: a a 1995 0.6436420 b
# 5: a a 1994 0.6436420 c
# 6: a a 1995 NA c
# 7: b b 1994 0.7617087 a
# 8: b b 1995 0.7617087 a
# 9: b b 1994 0.7617087 b
# 10: b b 1995 0.7617087 b
# 11: b b 1994 0.7617087 c
# 12: b b 1995 0.8479245 c
# 13: c c 1994 0.4891899 a
# 14: c c 1995 0.4891899 a
# 15: c c 1994 0.4891899 b
# 16: c c 1995 0.4891899 b
# 17: c c 1994 0.4891899 c
# 18: c c 1995 0.7520622 c
# what I expected:
DT[CJ(l[1:3],l[1:3],1994:1995)]
# from to year g
# 1: a a 1994 0.64364200
# 2: a a 1995 NA
# 3: a b 1994 0.69746294
# 4: a b 1995 0.56863539
# 5: a c 1994 0.64369566
# 6: a c 1995 NA
# 7: b a 1994 0.62198311
# 8: b a 1995 0.71919139
# 9: b b 1994 0.76170866
# 10: b b 1995 0.84792449
# 11: b c 1994 0.15793127
# 12: b c 1995 0.26623733
# 13: c a 1994 0.89921463
# 14: c a 1995 0.55417635
# 15: c b 1994 0.38938166
# 16: c b 1995 0.03778206
# 17: c c 1994 0.48918988
# 18: c c 1995 0.75206221
sorted=TRUEinCJ. TryDT[CJ(l,l,1994:1995, sorted=FALSE)]l[1:3]the behavior changes