1

I am trying to learn data.table syntax. I have most of the basics of simple summarizations but I am not getting how to use data.table to generate new columns from an existing column and summarize.

Here's a MWE example where I use dplyr and base tools to make multiple columns from one and thn summarize by grouping variables:

Current Input

##    fact1 fact2 X0
## 1      b     2  9
## 2      a     2  6
## 3      b     1  7
## 4      c     2  3
## 5      a     1  8
## 6      a     1  4
## 7      a     1  5
## 8      a     1  1
## 9      b     1  2
## 10     b     2 10

Base + dlyr Code

set.seed(10)
dat <- data.frame(
    fact1 = factor(sample(c('a', 'b', 'c'), 10, TRUE)), 
    fact2 = factor(sample(1:2, 10, TRUE)), 
    X0 = sample(1:10, 10)
)

add <- function(x, y) x + y
z <- sample(1:10, 6, FALSE)

library(dplyr)

z %>% 
    lapply(., add, dat[, 'X0']) %>%
    do.call(cbind, .) %>%
    cbind(dat, .) %>%
    data.frame() %>%
    group_by(fact1, fact2) %>%
    summarise_each(funs(sum))

Desired output

## Source: local data frame [5 x 9]
## Groups: fact1
## 
##   fact1 fact2 X0 X1 X2 X3 X4 X5 X6
## 1     a     1 18 42 22 26 46 30 34
## 2     a     2  6 12  7  8 13  9 10
## 3     b     1  9 21 11 13 23 15 17
## 4     b     2 19 31 21 23 33 25 27
## 5     c     2  3  9  4  5 10  6  7

While I'm asking for a data.table specific solution I think seeing base and dplyr etc. solutions that are clever may make this question appeal to a broader reader.

1
  • If you like having an add function, you might try magrittr, which includes it alongside similar fns Commented Jul 25, 2015 at 4:18

2 Answers 2

3

There might be better ways

library(data.table)
setDT(dat)[, paste0("X", 1:6):= lapply(z, add, X0),
           ][, lapply(.SD, sum), by = .(fact1, fact2)]

#    fact1 fact2 X0 X1 X2 X3 X4 X5 X6
# 1:     b     2 19 31 21 23 33 25 27
# 2:     a     2  6 12  7  8 13  9 10
# 3:     b     1  9 21 11 13 23 15 17
# 4:     c     2  3  9  4  5 10  6  7
# 5:     a     1 18 42 22 26 46 30 34
Sign up to request clarification or add additional context in comments.

2 Comments

You don't need dat$ inside dat; and to match the OP's sorting, one could switch the last part to keyby=.(fact1,fact2). This can also be done in one [], like setDT(dat)[, lapply(lapply(c(0,z), add, X0),sum),keyby=.(fact1,fact2)], though I suppose that might not generalize to the OP's use case.
I liked @Frank's addition as well. +1
2

A base R option is

dat[paste0('X', 1:6)] <- Map(add, list(dat$X0), z)
aggregate(.~fact1+fact2, dat, FUN=sum)
#  fact1 fact2 X0 X1 X2 X3 X4 X5 X6
#1     a     1 18 42 22 26 46 30 34
#2     b     1  9 21 11 13 23 15 17
#3     a     2  6 12  7  8 13  9 10
#4     b     2 19 31 21 23 33 25 27
#5     c     2  3  9  4  5 10  6  7

Or in a single step

aggregate(.~fact1+fact2, cbind(dat, mapply(add, list(dat$X0), z)), FUN=sum)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.