6

I use $ to add a list column to a data.table in R. When the data.table has more than one row, this works as expected.

library(data.table)

dt2 <- data.table(x = 1:2)
dt2$y <- list(c(1, 1), c(2, 2))
dt2
#>    x   y
#> 1: 1 1,1
#> 2: 2 2,2

However, when the data.table has exactly one row, only the first element of the vector in the list is returned with a warning:

dt1 <- data.table(x = 1)
dt1$y <- list(c(1, 1))
#> Warning in `[<-.data.table`(x, j = name, value = value): Supplied 2 items
#> to be assigned to 1 items of column 'y' (1 unused)
dt1
#>    x y
#> 1: 1 1

This seems inconsistent. Is it a feature or a bug?

By contrast, doing the same thing with data.frames returns the expected output, regardless of the number of rows in the data.frame.

df1 <- data.frame(x = 1)
df1$y <- list(c(1, 1))
df1
#>   x    y
#> 1 1 1, 1

df2 <- data.frame(x = 1:2)
df2$y <- list(c(1, 1), c(2, 2))
df2
#>   x    y
#> 1 1 1, 1
#> 2 2 2, 2
2

3 Answers 3

3

Besides Andre Elrico's suggestion to use the [[<- operator consistent behaviour can also be ensured if a double-nested list() is used. This will work for the $<- operator as well as data.table's := assignment operator.

2 row case

library(data.table)
dt2 <- data.table(x = 1:2)
dt2$y <- list(list(c(1, 1), c(2, 2)))
str(dt2)

dt2 <- data.table(x = 1:2)
dt2[, y := .(.(c(1, 1), c(2, 2)))]
str(dt2)

In both variants str(dt2) returns the same:

Classes ‘data.table’ and 'data.frame':    2 obs. of  2 variables:
 $ x: int  1 2
 $ y:List of 2
  ..$ : num  1 1
  ..$ : num  2 2
 - attr(*, ".internal.selfref")=<externalptr>

Please note that in data.table syntax list() can be abbreviated by .().

For comparison, here is the code which was used by the OP

dt2 <- data.table(x = 1:2)
dt2$y <- list(c(1, 1), c(2, 2))
str(dt2)

which creates the same structure

Classes ‘data.table’ and 'data.frame':    2 obs. of  2 variables:
 $ x: int  1 2
 $ y:List of 2
  ..$ : num  1 1
  ..$ : num  2 2
 - attr(*, ".internal.selfref")=<externalptr>

1 row case

dt1 <- data.table(x = 1)
dt1$y <- list(list(c(1, 1)))
str(dt1)

dt1 <- data.table(x = 1)
dt1[, y := .(.(c(1, 1)))]
str(dt1)

Again, the output of str(dt1) is identical for both code variants and also consistent with the 2 row case.

Classes ‘data.table’ and 'data.frame':    1 obs. of  2 variables:
 $ x: num 1
 $ y:List of 1
  ..$ : num  1 1
 - attr(*, ".internal.selfref")=<externalptr>
Sign up to request clarification or add additional context in comments.

Comments

2

It's a strange behavior. Feel free to open an issue about it. I don't like the $ anyways due to such problems and its static character.

For lists I like [[]]

Get your consistent behavior like this:

dt1 <- data.table(x = 1)
dt1[["y"]]<-list(c(1, 1))

dt2 <- data.table(x = 1:2)
dt2[["y"]] <- list(c(1, 1), c(2, 2))

Comments

2

From vignette("datatable-intro"):

As long as j returns a list, each element of the list will become a column in the resulting data.table.

In your code...

dt1 <- data.table(x = 1)
dt1$y <- list(c(1, 1))

list(c(1, 1)) is treated as j, and its first element is a length-two vector, interpreted as a length-two column. Since your data.table only has one row, this yields a warning. As noted in Uwe's answer, the way around this is to wrap in an extra list(...).

vignette("datatable-reference-semantics") brings up a convenience feature:

T[, c("colA", "colB", ...) := list(valA, valB, ...)]

# when you have only one column to assign to you
# can drop the quotes and list(), for convenience
DT[, colA := valA]

And this works in your other code...

dt2 <- data.table(x = 1:2)
dt2$y <- list(c(1, 1), c(2, 2))

... but falls apart as you noticed in the special case of one row where valA should create a list column, so it's better to follow the advice in Uwe's answer: consistently wrapping in an extra list(...) or .(...).

Also see "What are the smaller syntax differences between data.frame and data.table?" in vignette("datatable-faq") for other differences with data frames.

Side note: There's little point using a data.table if you're going to assign like DT$y <- v. It kind of defeats the purpose of the package to avoid the syntax that supports modifying the table by reference, namely DT[, y := v]...

1 Comment

The other answers are good, but this point was missing and it seemed too tedious to convince them to add it in, so... posting.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.