0

I'm trying to use a function that calculates some statistics by group in a data.table as follows:

minmax <- function(vec) {
  c(min(vec), max(vec))
}
library(data.table)
iris <- as.data.table(iris)
iris[, c('Min', 'Max') := minmax(Petal.Length), by = Species]

The result should be the min and max Petal.Length by Species and have as many rows as there are Species. That is, the same result as the code below:

merge(
  iris[, .(Min = min(Petal.Length)), Species],
  iris[, .(Max = max(Petal.Length)), Species],
  on = 'Species'
)

      Species Min Max
1:     setosa 1.0 1.9
2: versicolor 3.0 5.1
3:  virginica 4.5 6.9

Note: in my own code I want to do this in one go, not using merge().

3
  • Related: Create aggregate output data.table from function returning multiple output Commented Dec 28, 2021 at 18:29
  • Thank you @Henrik, I struggled momentarily to find the dupe (and then gave up looking). Commented Dec 28, 2021 at 18:32
  • 1
    I don't mind closing this question as a dupe but I think your answer definitely has merit compared to the original one. Commented Dec 28, 2021 at 18:33

1 Answer 1

3

There are a few things here:

  1. The := is for adding columns to an existing frame, so it will not summarize as you've demonstrated. That is, DT[, a := b] should always have exactly the same number of rows. I think this is not what you need.

  2. You can do multiple-assignment in your summarizing code and do away with the merge (perhaps this is similar to the code you said you do "in one go"):

    iris[, .(Min = min(Petal.Length), Max = max(Petal.Length)), by = .(Species)]
    #       Species   Min   Max
    #        <fctr> <num> <num>
    # 1:     setosa   1.0   1.9
    # 2: versicolor   3.0   5.1
    # 3:  virginica   4.5   6.9
    
  3. But finally, you are asking how to use your function to get this. First attempts might be:

    minmax <- function(vec) c(min(vec), max(vec))
    iris[, minmax(Petal.Length), by = .(Species)]
    #       Species    V1
    #        <fctr> <num>
    # 1:     setosa   1.0
    # 2:     setosa   1.9
    # 3: versicolor   3.0
    # 4: versicolor   5.1
    # 5:  virginica   4.5
    # 6:  virginica   6.9
    
    iris[, as.list(minmax(Petal.Length)), by = .(Species)]
    #       Species    V1    V2
    #        <fctr> <num> <num>
    # 1:     setosa   1.0   1.9
    # 2: versicolor   3.0   5.1
    # 3:  virginica   4.5   6.9
    
    iris[, setNames(as.list(minmax(Petal.Length)), c("Min", "Max")), by = .(Species)]
    #       Species   Min   Max
    #        <fctr> <num> <num>
    # 1:     setosa   1.0   1.9
    # 2: versicolor   3.0   5.1
    # 3:  virginica   4.5   6.9
    
    minmax <- function(vec) c(Min = min(vec), Max = max(vec))
    iris[, as.list(minmax(Petal.Length)), by = .(Species)]
    #       Species   Min   Max
    #        <fctr> <num> <num>
    # 1:     setosa   1.0   1.9
    # 2: versicolor   3.0   5.1
    # 3:  virginica   4.5   6.9
    

    So we can change the function to return a list (optionally named).

    minmax <- function(vec) list(min(vec), max(vec))
    iris[, minmax(Petal.Length), by = .(Species)]
    #       Species    V1    V2
    #        <fctr> <num> <num>
    # 1:     setosa   1.0   1.9
    # 2: versicolor   3.0   5.1
    # 3:  virginica   4.5   6.9
    iris[, setNames(minmax(Petal.Length), c("Min", "Max")), by = .(Species)]
    #       Species   Min   Max
    #        <fctr> <num> <num>
    # 1:     setosa   1.0   1.9
    # 2: versicolor   3.0   5.1
    # 3:  virginica   4.5   6.9
    
    minmax <- function(vec) list(Min = min(vec), Max = max(vec))
    iris[, minmax(Petal.Length), by = .(Species)]
    #       Species   Min   Max
    #        <fctr> <num> <num>
    # 1:     setosa   1.0   1.9
    # 2: versicolor   3.0   5.1
    # 3:  virginica   4.5   6.9
    
Sign up to request clarification or add additional context in comments.

5 Comments

This will work for me, thanks! It's not possible to specify the column names inside the [?
There are several options in this answer that allow you to set names on the return value given the function does not name them itself. See the use of setNames(as.list(...)), recently edited in.
The key for both summarizing and naming is to produce a list (as.list) and use setNames without actual :=-assignment.
Yes, now I see. I was stuck on trying to use a vector. Using a vector was not a great idea in any case because it can't handle multiple types.
@BobJansen Just remember the basic property of j; straight from the help text: "As long as j returns a list, each element of the list becomes a column in the resulting data.table" Cheers

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.