How to do operations on list columns in an R data.table to output another list column?

Question

I still have a difficult time thinking about how one works with R data.table columns which are lists.

Here is an R data.table:

library(data.table)
dt = data.table(
      numericcol = rep(42, 8),
      listcol = list(c(1, 22, 3), 6, 1, 12, c(5, 6, 1123), 3, 42, 1)
  )
> dt
   numericcol        listcol
1:         42        1,22, 3
2:         42              6
3:         42              1
4:         42             12
5:         42    5,   6,1123
6:         42              3
7:         42             42
8:         42              1

I would like to create a column for the absolute values between the elements of numericcol and listcol:

> dt
   numericcol        listcol    absvals 
1:         42        1,22, 3    41, 20, 39
2:         42              6    36
3:         42              1    41
4:         42             12    30
5:         42    5,   6,1123    37, 36, 1081
6:         42              3    39
7:         42             42    0
8:         42              1    41

So, my first thought would be to use sapply() as follows:

dt[, absvals := sapply(listcol, function(x) abs(x-numericcol))]

This outputs the following:

> dt
   numericcol        listcol absvals
1:         42        1,22, 3      41
2:         42              6      20
3:         42              1      39
4:         42             12      41
5:         42    5,   6,1123      20
6:         42              3      39
7:         42             42      41
8:         42              1      20

So, absvals is now a column of unlisted elements, with an individual element in each row, and is a different dimension than the data.table.

(1) How would one create absvals to retain the list structure of listcol?

(2) In cases like these, if I am only interested in a vector of the values, how do R data.table users create such a data structure?

Maybe

vec = as.vector(dt[, absvals := sapply(listcol, function(x) abs(x-numericcol))])

?

missuse · Accepted Answer · 2018-04-20 11:11:33Z

11

Another solution using mapply:

dt[, absvals := mapply(listcol, numericcol, FUN = function(x, y) abs(x-y))]

#output
dt
   numericcol        listcol        absvals
1:         42        1,22, 3       41,20,39
2:         42              6             36
3:         42              1             41
4:         42             12             30
5:         42    5,   6,1123   37,  36,1081
6:         42              3             39
7:         42             42              0
8:         42              1             41

answered Apr 20, 2018 at 11:11

missuse

19.9k3 gold badges29 silver badges53 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

MichaelChirico Over a year ago

superior answer. very nice

MichaelChirico · Accepted Answer · 2018-04-20 11:09:49Z

This is fundamentally a row-wise operation, I think, so the approach is bound to be a bit wonky. And the key to remember with list columns in data.table is that [.data.table assumes any output of j which is a list refers to columns -- so you need to wrap any list in list again to make j understand there's only one column.

I think this works for your case:

dt[ , abs_vals := list(lapply(seq_along(.I), function(ii) 
  abs(listcol[[ii]] - numericcol[ii])))][]
#    numericcol        listcol       abs_vals
# 1:         42        1,22, 3       41,20,39
# 2:         42              6             36
# 3:         42              1             41
# 4:         42             12             30
# 5:         42    5,   6,1123   37,  36,1081
# 6:         42              3             39
# 7:         42             42              0
# 8:         42              1             41

The seq_along(.I) part is handling the row-wise aspect.

minem · Accepted Answer · 2018-04-20 11:27:16Z

3

Maybe you really do not need list column? It looks like you could do all of this simpler.

# convert to long format:
dt2 <- dt[, .(var = unlist(listcol)), by = numericcol]
dt2[, absval := abs(var - numericcol)]
dt2
    numericcol  var absval
 1:         42    1     41
 2:         42   22     20
 3:         42    3     39
 4:         42    6     36
 5:         42    1     41
 6:         42   12     30
 7:         42    5     37
 8:         42    6     36
 9:         42 1123   1081
10:         42    3     39
11:         42   42      0
12:         42    1     41

In my experience it is harder and much slower to work with list objects than simple data.tables.

answered Apr 20, 2018 at 11:27

minem

3,6502 gold badges19 silver badges31 bronze badges

Comments

clemens · Accepted Answer · 2018-04-20 11:19:33Z

2

You could use apply() to go through your data.table row by row and get the absolute value of the difference of numericol and each element of listcol like this;

dt[, absvals := apply(.SD, 
                      1, 
                      function(x) abs(x$numericcol - x$listcol))]

The output is this:

   numericcol        listcol        absvals
1:         42        1,22, 3       41,20,39
2:         42              6             36
3:         42              1             41
4:         42             12             30
5:         42    5,   6,1123   37,  36,1081
6:         42              3             39
7:         42             42              0
8:         42              1             41

answered Apr 20, 2018 at 11:19

clemens

6,8433 gold badges24 silver badges34 bronze badges

Comments

akrun · Accepted Answer · 2018-04-20 11:24:20Z

We can use Map

dt[, absvals := Map(function(x, y) abs(x-y), listcol, numericcol)]
dt
#    numericcol        listcol        absvals
#1:         42        1,22, 3       41,20,39
#2:         42              6             36
#3:         42              1             41
#4:         42             12             30
#5:         42    5,   6,1123   37,  36,1081
#6:         42              3             39
#7:         42             42              0
#8:         42              1             41

Or with purrr::map

dt[, absvals := map2(listcol, numericcol, ~ abs(.x -.y))]

Instead of looping many times, there is also an option to unlist and get the absolute difference from the replicated 'numericol' based on the lengths of 'listvals'. It could be more efficient

dt[, absvals := relist(abs(rep(numericcol, lengths(listcol)) - 
                   unlist(listcol)), skeleton = listcol)]

NOTE: Here, there is no need to replicate as it is the same value for 'numericol', but the rep is for general case

Collectives™ on Stack Overflow

How to do operations on list columns in an R data.table to output another list column?

5 Answers 5

1 Comment

Comments

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

1 Comment

Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related