4

I'd like to assign a value to a variable, then use that variable to create a new variable. The syntax for data.table supports multiple assignment, but apparently not with internal references. The "i" and "by" clauses in my real use-case are more complicated, so I'd prefer not to have repeating code like this:

require(data.table)

dt <- data.table(
  x = 1:5, 
  y = 2:6
)

# this works
dt[x == 3, z1 := x + y]
dt[x == 3, z2 := z1 + 5]

# but I wish this worked
dt[x == 3, `:=`(
  z1 = x + y,
  z2 = z1 + 5
)]

In contrast, this works in dplyr:

require(dplyr)

df <- data.frame(
  x = 1:5, 
  y = 2:6
)

df <- mutate(df,
  z1 = x + y,
  z2 = z1 + 5
)

Is there a clean way to do this using data.table?

EDIT: Tweaking akrun's solution slightly, I figured out a way to keep the readable, sequential syntax I was looking for. It's just doing all of the operations outside the list:

dt[x==3, c('z1','z2','z3') := {
  z1 <- x+y
  z2 <- z1 + 5
  z3 <- z2 + 6
  list(z1, z2, z3) 
}]
2
  • 1
    The option you showed for dplyr is not the same as in data.table as it is not filtering for x==3 In dplyr, I am guessing either we need ifelse or do a filter, do the mutate and left_join which should be expensive if I am not wrong. Commented Jun 4, 2016 at 5:20
  • 2
    That's true. I just brought it up as an example of internal reference. I think I'll leave it as-is because I already know dplyr is slower. Thanks for your help Commented Jun 4, 2016 at 5:25

1 Answer 1

4

We can use curly brackets to create the temporary variables, then place them in a list along with the calculation based on that variable, assign (:=) to the columns we need to create.

dt[x==3, c('z1', 'z2') := {
             z1 <- x+y
             list(z1, z1+5) 
             }]
dt
#   x y z1 z2
#1: 1 2 NA NA
#2: 2 3 NA NA
#3: 3 4  7 12
#4: 4 5 NA NA
#5: 5 6 NA NA

To make it a bit more faster, we can use setkey

setkey(dt, x)[(3),  c('z1', 'z2') := {
                                   z1 <- x+y
                              list(z1, z1+5)
                  }]

Benchmarks

set.seed(24)
dt1 <- data.table(x = sample(1:9, 1e8, replace=TRUE), y = sample(5:9, 1e8, replace=TRUE))

dt2 <- copy(dt1)
dt3 <- copy(dt1)

akrun1 <- function(){dt1[x==3, c('z1', 'z2') := {
             z1 <- x+y
                 list(z1, z1+5) 
             }]
   }

akrun2 <- function() {setkey(dt3, x)[(3),  c('z1', 'z2') := {
                                   z1 <- x+y
                              list(z1, z1+5)
                  }]
}


rsoren  <- function() {
    dt2[x == 3, z1 := x + y]
    dt2[x == 3, z2 := z1 + 5]
        }



library(microbenchmark)
microbenchmark(akrun1(), akrun2(), rsoren(), unit= "relative", times = 20L)
#Unit: relative
#     expr      min       lq     mean   median       uq       max neval
# akrun1() 1.597267 1.605404 1.393016 1.642584 1.538929 0.8634406    20
# akrun2() 1.000000 1.000000 1.000000 1.000000 1.000000 1.0000000    20
# rsoren() 2.584153 2.586185 2.179601 2.694469 2.468219 0.9740701    20
Sign up to request clarification or add additional context in comments.

10 Comments

If I'm not mistaken, the z1 <- x+y part is base R and doesn't take advantage of assignment in place with :=. I'm really looking for speed (hence the reluctant switch from dplyr) but you're right that this technically works. Already an improvement over what I had
@rsoren Here the assignment happens only once (:=) if you have checked the code and also you don't have to do the x==3 multiple times which will again slow down the code.
Ok I see what you mean with one assignment. What do you mean checked the code?
@rsoren I meant if you have looked the code carefully
Thanks. I'm going to keep this open for a while in hopes that someone can avoid the c('var1','var2','var3') := {stuff1;stuff2;stuff3} syntax, which I don't find as readable as something like my dplyr example
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.