5

Hi still trying to figure out data.table. If I have a data.table of values such as those below, what is the most efficient way to replace the values with those from another data.table?

set.seed(123456)

a=data.table(
  date_id = rep(seq(as.Date('2013-01-01'),as.Date('2013-04-10'),'days'),5),
  px =rnorm(500,mean=50,sd=5),
  vol=rnorm(500,mean=500000,sd=150000),
  id=rep(letters[1:5],each=100)
  )

b=data.table(
  date_id=rep(seq(as.Date('2013-01-01'),length.out=600,by='days'),5),
  id=rep(letters[1:5],each=600),
  px=NA_real_,
  vol=NA_real_
  )

setkeyv(a,c('date_id','id'))
setkeyv(b,c('date_id','id'))

What I'm trying to do is replace the px and vol in b with those in a where date_id and id match I'm a little flummoxed with this - I would suppose that something along the lines of might be the way to go but I don't think this will work in practice.

b[which(b$date_id %in% a$date_id & b$id %in% a$id),list(px:=a$px,vol:=a$vol)]

EDIT

I tried the following

t = a[b,roll=T]
t[!is.na(px),list(px.1:=px,vol.1=vol),by=list(date_id,id)]

and got the error message

Error in `:=`(px.1, px) : 
  := is defined for use in j only, and (currently) only once; i.e., DT[i,col:=1L] and DT[,newcol:=sum(colB),by=colA] are ok, but not DT[i,col]:=1L, not DT[i]$col:=1L and not DT[,{newcol1:=1L;newcol2:=2L}]. Please see help(":="). Check is.data.table(DT) is TRUE.
4
  • 1
    I think the way to do this is with a fast rolling join. This joins the columns to the table based on the key b <- b[a,roll=T] you could just drop the original NA columns after if you don't want them, although there is probably a better way to do it in one step. Commented May 25, 2013 at 11:43
  • Thanks @SimonO101, in practice, b is a very large array which will be progressively populated every day. I need to preserve the remaining values as they are in b Commented May 25, 2013 at 14:07
  • If you're updating something every day, shouldn't you be using a database management system? There are packages that (I think) allow you to access your DB from R: postgresql and RMySQL. Commented May 26, 2013 at 4:23
  • 1
    Thanks @Frank that is where I'd like the information to wind up. This is back data and then each new data block is intended to be transferred to a postgres database. I can't control the transmission medium which is likely to be excel sheets with slightly variable formatting... Wish I could.. :-) Commented May 26, 2013 at 6:56

2 Answers 2

8

If you are wanting to replace the values within b you can use the prefix i.. From the NEWS regarding version 1.7.10

The prefix i. can now be used in j to refer to join inherited columns of i that are otherwise masked by columns in x with the same name.

b[a, `:=`(px = i.px, vol = i.vol)]
Sign up to request clarification or add additional context in comments.

1 Comment

Although this is a very clever and terrific answer, I would take a longer route and do this: merge(a, b, by=c("date_id","id"), all=TRUE) and then rename and check the result.
2

Doesn't sound like you need the roll from your description, and it seems like you want to do this instead when you get your error:

t[!is.na(px),`:=`(px.1=px,vol.1=vol),by=list(date_id,id)]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.