R data.table replacing an index of values from another data.table

Question

Hi still trying to figure out data.table. If I have a data.table of values such as those below, what is the most efficient way to replace the values with those from another data.table?

set.seed(123456)

a=data.table(
  date_id = rep(seq(as.Date('2013-01-01'),as.Date('2013-04-10'),'days'),5),
  px =rnorm(500,mean=50,sd=5),
  vol=rnorm(500,mean=500000,sd=150000),
  id=rep(letters[1:5],each=100)
  )

b=data.table(
  date_id=rep(seq(as.Date('2013-01-01'),length.out=600,by='days'),5),
  id=rep(letters[1:5],each=600),
  px=NA_real_,
  vol=NA_real_
  )

setkeyv(a,c('date_id','id'))
setkeyv(b,c('date_id','id'))

What I'm trying to do is replace the px and vol in b with those in a where date_id and id match I'm a little flummoxed with this - I would suppose that something along the lines of might be the way to go but I don't think this will work in practice.

b[which(b$date_id %in% a$date_id & b$id %in% a$id),list(px:=a$px,vol:=a$vol)]

EDIT

I tried the following

t = a[b,roll=T]
t[!is.na(px),list(px.1:=px,vol.1=vol),by=list(date_id,id)]

and got the error message

Error in `:=`(px.1, px) : 
  := is defined for use in j only, and (currently) only once; i.e., DT[i,col:=1L] and DT[,newcol:=sum(colB),by=colA] are ok, but not DT[i,col]:=1L, not DT[i]$col:=1L and not DT[,{newcol1:=1L;newcol2:=2L}]. Please see help(":="). Check is.data.table(DT) is TRUE.

I think the way to do this is with a fast rolling join. This joins the columns to the table based on the key b <- b[a,roll=T] you could just drop the original NA columns after if you don't want them, although there is probably a better way to do it in one step. — Simon O'Hanlon
– Simon O'Hanlon, Commented May 25, 2013 at 11:43
Thanks @SimonO101, in practice, b is a very large array which will be progressively populated every day. I need to preserve the remaining values as they are in b — Tahnoon Pasha
– Tahnoon Pasha, Commented May 25, 2013 at 14:07
If you're updating something every day, shouldn't you be using a database management system? There are packages that (I think) allow you to access your DB from R: postgresql and RMySQL. — Frank
– Frank, Commented May 26, 2013 at 4:23
Thanks @Frank that is where I'd like the information to wind up. This is back data and then each new data block is intended to be transferred to a postgres database. I can't control the transmission medium which is likely to be excel sheets with slightly variable formatting... Wish I could.. :-) — Tahnoon Pasha
– Tahnoon Pasha, Commented May 26, 2013 at 6:56

mnel · Accepted Answer · 2013-05-28 23:21:15Z

8

If you are wanting to replace the values within b you can use the prefix i.. From the NEWS regarding version 1.7.10

The prefix i. can now be used in j to refer to join inherited columns of i that are otherwise masked by columns in x with the same name.

b[a, `:=`(px = i.px, vol = i.vol)]

answered May 28, 2013 at 23:21

mnel

116k28 gold badges269 silver badges255 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

geneorama Over a year ago

Although this is a very clever and terrific answer, I would take a longer route and do this: merge(a, b, by=c("date_id","id"), all=TRUE) and then rename and check the result.

eddi · Accepted Answer · 2013-05-28 17:57:01Z

2

Doesn't sound like you need the roll from your description, and it seems like you want to do this instead when you get your error:

t[!is.na(px),`:=`(px.1=px,vol.1=vol),by=list(date_id,id)]

answered May 28, 2013 at 17:57

eddi

49.5k6 gold badges109 silver badges157 bronze badges

Collectives™ on Stack Overflow

R data.table replacing an index of values from another data.table

2 Answers 2

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related