8

I still have a difficult time thinking about how one works with R data.table columns which are lists.

Here is an R data.table:

library(data.table)
dt = data.table(
      numericcol = rep(42, 8),
      listcol = list(c(1, 22, 3), 6, 1, 12, c(5, 6, 1123), 3, 42, 1)
  )
> dt
   numericcol        listcol
1:         42        1,22, 3
2:         42              6
3:         42              1
4:         42             12
5:         42    5,   6,1123
6:         42              3
7:         42             42
8:         42              1

I would like to create a column for the absolute values between the elements of numericcol and listcol:

> dt
   numericcol        listcol    absvals 
1:         42        1,22, 3    41, 20, 39
2:         42              6    36
3:         42              1    41
4:         42             12    30
5:         42    5,   6,1123    37, 36, 1081
6:         42              3    39
7:         42             42    0
8:         42              1    41

So, my first thought would be to use sapply() as follows:

dt[, absvals := sapply(listcol, function(x) abs(x-numericcol))]

This outputs the following:

> dt
   numericcol        listcol absvals
1:         42        1,22, 3      41
2:         42              6      20
3:         42              1      39
4:         42             12      41
5:         42    5,   6,1123      20
6:         42              3      39
7:         42             42      41
8:         42              1      20

So, absvals is now a column of unlisted elements, with an individual element in each row, and is a different dimension than the data.table.

(1) How would one create absvals to retain the list structure of listcol?

(2) In cases like these, if I am only interested in a vector of the values, how do R data.table users create such a data structure?

Maybe

vec = as.vector(dt[, absvals := sapply(listcol, function(x) abs(x-numericcol))])

?

5 Answers 5

11

Another solution using mapply:

dt[, absvals := mapply(listcol, numericcol, FUN = function(x, y) abs(x-y))]

#output
dt
   numericcol        listcol        absvals
1:         42        1,22, 3       41,20,39
2:         42              6             36
3:         42              1             41
4:         42             12             30
5:         42    5,   6,1123   37,  36,1081
6:         42              3             39
7:         42             42              0
8:         42              1             41
Sign up to request clarification or add additional context in comments.

1 Comment

superior answer. very nice
3

This is fundamentally a row-wise operation, I think, so the approach is bound to be a bit wonky. And the key to remember with list columns in data.table is that [.data.table assumes any output of j which is a list refers to columns -- so you need to wrap any list in list again to make j understand there's only one column.

I think this works for your case:

dt[ , abs_vals := list(lapply(seq_along(.I), function(ii) 
  abs(listcol[[ii]] - numericcol[ii])))][]
#    numericcol        listcol       abs_vals
# 1:         42        1,22, 3       41,20,39
# 2:         42              6             36
# 3:         42              1             41
# 4:         42             12             30
# 5:         42    5,   6,1123   37,  36,1081
# 6:         42              3             39
# 7:         42             42              0
# 8:         42              1             41

The seq_along(.I) part is handling the row-wise aspect.

Comments

3

Maybe you really do not need list column? It looks like you could do all of this simpler.

# convert to long format:
dt2 <- dt[, .(var = unlist(listcol)), by = numericcol]
dt2[, absval := abs(var - numericcol)]
dt2
    numericcol  var absval
 1:         42    1     41
 2:         42   22     20
 3:         42    3     39
 4:         42    6     36
 5:         42    1     41
 6:         42   12     30
 7:         42    5     37
 8:         42    6     36
 9:         42 1123   1081
10:         42    3     39
11:         42   42      0
12:         42    1     41

In my experience it is harder and much slower to work with list objects than simple data.tables.

Comments

2

You could use apply() to go through your data.table row by row and get the absolute value of the difference of numericol and each element of listcol like this;

dt[, absvals := apply(.SD, 
                      1, 
                      function(x) abs(x$numericcol - x$listcol))]

The output is this:

   numericcol        listcol        absvals
1:         42        1,22, 3       41,20,39
2:         42              6             36
3:         42              1             41
4:         42             12             30
5:         42    5,   6,1123   37,  36,1081
6:         42              3             39
7:         42             42              0
8:         42              1             41

Comments

2

We can use Map

dt[, absvals := Map(function(x, y) abs(x-y), listcol, numericcol)]
dt
#    numericcol        listcol        absvals
#1:         42        1,22, 3       41,20,39
#2:         42              6             36
#3:         42              1             41
#4:         42             12             30
#5:         42    5,   6,1123   37,  36,1081
#6:         42              3             39
#7:         42             42              0
#8:         42              1             41

Or with purrr::map

dt[, absvals := map2(listcol, numericcol, ~ abs(.x -.y))]

Instead of looping many times, there is also an option to unlist and get the absolute difference from the replicated 'numericol' based on the lengths of 'listvals'. It could be more efficient

dt[, absvals := relist(abs(rep(numericcol, lengths(listcol)) - 
                   unlist(listcol)), skeleton = listcol)]

NOTE: Here, there is no need to replicate as it is the same value for 'numericol', but the rep is for general case

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.