2

I am calculating median and values for different quantiles for a continuous variable. I want to add all the columns in a single step. Is this possible to do this. Following is a reproducible example.

df <- data.frame(group = rep(c('group1','group2'),50),
             x = rnorm(100), 
             y = rnorm(100))
df %>% 
gather('variable','value', -group) %>% 
group_by(group, variable) %>% 
summarise(median = round(quantile(value,0.5, na.rm = T),2),
          iqr25 = round(quantile(value,0.25, na.rm = T),2),
          iqr75 = round(quantile(value,0.75, na.rm = T),2))

OUTPUT

# A tibble: 4 x 5
# Groups:   group [2]
  group  variable median iqr25 iqr75
  <fct>  <chr>     <dbl> <dbl> <dbl>
1 group1 x          0.06 -0.74  1.04
2 group1 y         -0.36 -1.03  0.45
3 group2 x         -0.04 -0.85  0.62
4 group2 y          0.06 -0.56  0.89

Can this summarise step be done without writing the quantile function 3 times.

I did a work around using this. But is there a nice way to do this.

df %>% 
gather('variable','value', -group) %>% 
group_by(group, variable) %>% 
summarise(s = toString(round(quantile(value, c(0.25,0.5,0.75),na.rm = T),2))) %>% 
separate(s, into = c('q25','median','q75'), sep = ',')

2 Answers 2

1

You can nest the data after the group_by and then map to quantile

df %>% 
  gather('variable','value', -group) %>% 
  group_by(group, variable) %>% 
  nest() %>% 
  mutate(quant = map(data, ~quantile(.$value, probs = c(0.25, 0.5, 0.75))),
         quant = map(quant, t),
         quant = map(quant, as.data.frame),
         quant = map(quant, setNames, c("iqr25", "median", "iqr75")),

         ) %>% 
  unnest(quant) %>% 
  select(-data)

# A tibble: 4 x 5
  group  variable  iqr25  median iqr75
  <fct>  <chr>     <dbl>   <dbl> <dbl>
1 group1 x        -0.876 -0.173  0.471
2 group2 x        -0.372  0.0507 0.519
3 group1 y        -0.785 -0.109  0.618
4 group2 y        -0.944 -0.117  0.647
Sign up to request clarification or add additional context in comments.

3 Comments

Nice! Can we add column names to this?
see edit for names - could also use dplyr::rename after the unnest
Thanks this nest followed by map approach helps in a lot of other things as well. Will try to incorporate these in routine workflow
0

Another approach using nest:

df %>%
  gather('variable', 'value', -group) %>%
  group_by(group, variable) %>%
  nest() %>%
  mutate(quants = map(data, function(x) 
    quantile(x$value, c(0.25,0.5,0.75)))) %>%
  unnest(quants) %>%
  group_by(group, variable) %>%
  mutate(case = c("iqr25", "median" , "iqr75")) %>%
  spread(case, quants) %>% 
  mutate_if(is.numeric, round, 2)

# A tibble: 4 x 5
# Groups:   group, variable [4]
  group  variable iqr25 iqr75 median
  <fct>  <chr>    <dbl> <dbl>  <dbl>
1 group1 x        -0.5   0.7    0.09
2 group1 y        -0.54  0.7    0.1 
3 group2 x        -0.59  0.61  -0.06
4 group2 y        -0.89  0.35  -0.11

4 Comments

This code actually looks more verbose and messy than using quantile function 3 times in summarize !
Well, I for one think that your original code is cleanest and most readable, even though you are calling quantile 3 times. The alternatives are performing unnecessary transformations on the whole dataset to achieve the same thing in more lines of code. I agree that Richard's solution is cleaner than mine, but still we are replacing 3 calls to quantile with 4 calls to map?!
yes. that is the reason I didn't accept his answer.
You could just use one call to map, but the contents then get a bit messy - map(data, ~{quantile(.$value, probs = c(0.25, 0.5, 0.75))) %>% t() %>% as.data.frame() %>% setNames, c("iqr25", "median", "iqr75"))})

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.