3

I would like to write a function that summarize the provided data by some specified criteria, in this case by age

The example data is a table of users' age and their stats.

df <- data.frame('Age'=rep(18:25,2), 'X1'=10:17, 'X2'=28:35,'X4'=22:29)

Next I define the output columns that are relevant for the analysis

output_columns <- c('Age', 'X1', 'X2', 'X3')

This function computes the basic the sum of X1. X2 and X3 grouped by age.

aggr <- function(data, criteria, output_columns){
  k <- data %>% .[, colnames(.) %in% output_columns] %>%
    group_by_(.dots = criteria) %>%
    #summarise_each(funs(count), age) %>%
    summarize_if(is.numeric, sum)
  return (k)
}

When I call it like this

> e <- aggr(df, "Age", output_columns)
> e
# A tibble: 8 x 3
    Age    X1    X2
  <int> <int> <int>
1    18    20    56
2    19    22    58
3    20    24    60
4    21    26    62
5    22    28    64
6    23    30    66
7    24    32    68
8    25    34    70

I want to have another column called count which shows the number of observations in each age group. Desired output is

> desired
  Age X1 X2 count
1  18 20 56     2
2  19 22 58     2
3  20 24 60     2
4  21 26 62     2
5  22 28 64     2
6  23 30 66     2
7  24 32 68     2
8  25 34 70     2

I have tried different ways to do that, e.g. tally(), summarize_each etc. They all deliver wrong results.

I believe their should be an easy and simple way to do that. Any help is appreciated.

2
  • I think you need %>% group_by_at(criteria) %>% mutate(count = n()) added into your function Commented May 24, 2018 at 14:22
  • 1
    The last line in the pipeline could be replaced with: { cbind(summarize_if(., is.numeric, sum), summarize(., n = n())) } Commented May 24, 2018 at 14:55

3 Answers 3

4

Since you're already summing all variables, you can just add a column of all 1s before the summary function

aggr <- function(data, criteria, output_columns){ 
    data %>% 
      .[, colnames(.) %in% output_columns] %>%
      group_by_(.dots = criteria) %>%
      mutate(n = 1L) %>%
      summarize_if(is.numeric, sum)
}

# A tibble: 8 x 4
    Age    X1    X2     n
  <int> <int> <int> <int>
1    18    20    56     2
2    19    22    58     2
3    20    24    60     2
4    21    26    62     2
5    22    28    64     2
6    23    30    66     2
7    24    32    68     2
8    25    34    70     2
Sign up to request clarification or add additional context in comments.

2 Comments

Is it better than my answer
that's clever, like it
1

We could create the 'count' column before summarise_if

aggr<- function(data, criteria, output_columns){
                data %>% 
                   select(intersect(names(.), output_columns))%>%
                   group_by_at(criteria)%>%   
                   group_by(count = n(), add= TRUE) %>%                                
                   summarize_if(is.numeric,sum) %>%
                   select(setdiff(names(.), 'count'), count)                                     

    }




aggr(df,"Age",output_columns)
# A tibble: 8 x 4
# Groups:   Age [8]
#    Age    X1    X2 count
#  <int> <int> <int> <int>
#1    18    20    56     2
#2    19    22    58     2
#3    20    24    60     2
#4    21    26    62     2
#5    22    28    64     2
#6    23    30    66     2
#7    24    32    68     2
#8    25    34    70     2

3 Comments

@Ryan That is a good way. I find that you posted an answer. I like it
adding this line group_by(count = n(), add= TRUE) to my original function works just as fine. Can that lead to any mistake?
@esem After the summarise step, you can ungroup if you need any more transformations
1

In base R you could do

aggr <- function(data, criteria, output_columns){
  ds <- data[, colnames(data) %in% output_columns]
  d <- aggregate(ds, by=list(criteria), function(x) c(sum(x), length(x)))
  "names<-"(do.call(data.frame, d)[, -c(2:3, 5)], c(names(ds), "n"))
}

> with(df, aggr(df, Age, output_columns))
  Age X1 X2 n
1  18 20 56 2
2  19 22 58 2
3  20 24 60 2
4  21 26 62 2
5  22 28 64 2
6  23 30 66 2
7  24 32 68 2
8  25 34 70 2

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.