0

When using a list column of data.tables in a nested data.table it is easy to apply a function over the column. Example:

dt<- data.table(mtcars)[, list(dt.mtcars = list(.SD)), by = gear]

We can use:

dt[ ,list(length = nrow(dt.mtcars[[1]])), by = gear]

dt[ ,list(length = nrow(dt.mtcars[[1]])), by = gear]

   gear length
1:    4     12
2:    3     15
3:    5      5

or

dt[, list( length = lapply(dt.mtcars, nrow)), by = gear]

  gear length
1:    4     12
2:    3     15
3:    5      5

I would like to do the same process and apply a modification by reference using the operator := to each data.table of the column.

Example:

modify_by_ref<- function(d){

  d[, max_hp:= max(hp)]


}

dt[, modify_by_ref(dt.mtcars[[1]]), by  = gear]

That returns the error:

 Error in `[.data.table`(d, , `:=`(max_hp, max(hp))) : 
  .SD is locked. Using := in .SD's j is reserved for possible future use; a tortuously flexible way to modify by group. Use := in j directly to modify by group by reference. 

Using the tip in the error message do not works in any way for me, it seems to be targeting another case but maybe I am missing something. Is there any recommended way or flexible workaround to modify list columns by refence?

3
  • As I understand the error message, it's telling that exactly what you are trying to do is not possible (yet). Instead you have to use := directly in your j-expression Commented Oct 9, 2017 at 15:02
  • The problem with using := directly in the j-expression is that is possible only if the data.table is unnested first. Commented Oct 9, 2017 at 15:14
  • 3
    This is generally not advisable. Stack your tables into one and do by= operations, which are optimized for max and other common summary functions... Commented Oct 9, 2017 at 15:20

1 Answer 1

1

This can be done in following two steps or in Single Step:

The given table is:

dt<- data.table(mtcars)[, list(dt.mtcars = list(.SD)), by = gear]

Step 1 - Let's add list of column hp vectors in each row of dt

dt[, hp_vector := .(list(dt.mtcars[[1]][, hp])), by = list(gear)]

Step 2 - Now calculate the max of hp

dt[, max_hp := max(hp_vector[[1]]), by = list(gear)]

The given table is:

dt<- data.table(mtcars)[, list(dt.mtcars = list(.SD)), by = gear]

Single Step - Single step is actually the combination of both of the above steps:

dt[, max_hp := .(list(max(dt.mtcars[[1]][, hp])[[1]])), by = list(gear)]

If we wish to populate values within nested table by Reference then the following link talks about how to do it, just that we need to ignore a warning message. I will be happy if anyone can point me how to fix the warning message or is there any pitfall. For more detail please refer the link:

https://stackoverflow.com/questions/48306010/how-can-i-do-fast-advance-data-manipulation-in-nested-data-table-data-table-wi/48412406#48412406

Taking inspiration from the same i am going to show how to do it here for the given data set.

Let's first clean everything:

rm(list = ls())

Let's re-define the given table in different way:

dt<- data.table(mtcars)[, list(dt.mtcars = list(data.table(.SD))), by = list(gear)]

Note that i have defined the table slightly different. I have used data.table in addition to list in the above definition.

Next, populate the max by reference within nested table:

dt[, dt.mtcars := .(list(dt.mtcars[[1]][, max_hp := max(hp)])), by = list(gear)]

And, what good one can expect, we can perform manipulation within nested table:

dt[, dt.mtcars := .(list(dt.mtcars[[1]][, weighted_hp_carb := max_hp*carb])), by = list(gear)]
Sign up to request clarification or add additional context in comments.

5 Comments

Thanks for the detailed tutorial, but your solution doesn't actually address the question. It creates a new column in dt, while the desired output would be a new column inside each nested data.table. ( that, as commented by other users, is not possible yet.)
To see the difference, consider the case where the function is actually an injection and not a summary( ex: new_hp = hp + 1).
Actually, that's why i pointed to refer the link for more detail. Anyways, here is how we can add max_hp in nested table (I have updated my answer). Also, we can do manipulations. Let me know if you could figure out any pitfalls of using it.
@Nicolas Pinto - any comment??
Great, with the editions your answers is the best solution I have seen so far. I would appreciate if you edit again just to warn that the first suggestion do not solve the problem. I actually think that omitting it and going directly to the solution would be a better approach, but that is your choice. Thanks!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.