How to divide variables between groups of rows using dplyr without listing them?

Question

Following this question How to divide between groups of rows using dplyr?.

If I have this data frame:

id = c("a","a","b","b","c","c")
condition = c(0,1,0,1,0,1)
gene1 = sample(1:100,6)
gene2 = sample(1:100,6)
#...
geneN = sample(1:100,6)

df = data.frame(id,condition,gene1,gene2,geneN)

I want to group by id and divide the value of rows with condition == 0 with those with condition == 1 to get this :

df[condition == 0,3:5]/ df[condition == 1,3:5]
#
      gene1     gene2     geneN
1 0.2187500 0.4946237 0.3750000
3 0.4700000 0.6382979 0.5444444
5 0.7674419 0.5471698 2.3750000

I can use dplyr as follows:

df %>% 
    group_by(id) %>%
    summarise(gene1 = gene1[condition == 0] / gene1[condition == 1],
              gene2 = gene2[condition == 0] / gene2[condition == 1],
              geneN = geneN[condition == 0] / geneN[condition == 1])

But I have e.g. 100 variables such as below. How can I do that without having to list all the genes.

id = c("a","a","b","b","c","c")
condition = c(0,1,0,1,0,1)
genes = matrix(1:600,ncol = 100)
df = data.frame(id,condition,genes)

please, can you revise your example and include "many variables" — Roman
– Roman, Commented Feb 13, 2018 at 15:23

www · Accepted Answer · 2018-02-13 15:31:04Z

3

We can use summarise_atto apply the same operation to many columns.

library(dplyr)

df2 <- df %>%
  group_by(id) %>%
  arrange(condition) %>%
  summarise_at(vars(-condition), funs(first(.)/last(.))) %>%
  ungroup()
df2
# # A tibble: 3 x 4
#   id    gene1 gene2 geneN
#   <fct> <dbl> <dbl> <dbl>
# 1 a     0.524 2.28  0.654
# 2 b     1.65  0.616 1.38 
# 3 c     0.578 2.00  2.17

edited Feb 13, 2018 at 15:31

answered Feb 13, 2018 at 15:23

www

39.3k12 gold badges52 silver badges93 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

CPak Over a year ago

you might want to add an arrange to ensure that you're dividing the right rows, since first() and last() won't check for it.

www Over a year ago

@CPak Good idea. I will add that.

Elmahy Over a year ago

This answer is great, but it's very slow with larger data e.g. id = c("a","a","b","b","c","c"); condition = c(0,1,0,1,0,1); genes = matrix(1:30000,ncol = 5000); df = data.frame(id,condition,genes)

www Over a year ago

If that is the case, perhaps explore the solutions in data.table or use matrix for all the calculation.

Roman · Accepted Answer · 2018-02-13 15:31:57Z

1

You can try

df %>% 
  gather(k,v, -id, -condition) %>% 
  spread(condition, v) %>% 
  mutate(ratio=`0`/`1`) %>% 
  select(id, k, ratio) %>% 
  spread(k, ratio)
  id      gene1     gene2    geneN
1  a  0.3670886 0.5955056 1.192982
2  b  0.4767442 1.2222222 0.125000
3  c 18.2000000 2.0909091 6.000000

used your data with set.seed(123)

answered Feb 13, 2018 at 15:31

Roman

17.7k3 gold badges39 silver badges52 bronze badges

Comments

moodymudskipper · Accepted Answer · 2018-02-13 16:12:39Z

0

If your dataset is sorted and without irregularities you can do this using purr::map_dfr:

df[paste0("gene",c(1,2,"N"))] %>% map_dfr(~.x[c(F,T)]/.x[c(T,F)])
# # A tibble: 3 x 3
#       gene1    gene2      geneN
#       <dbl>    <dbl>      <dbl>
# 1 0.1764706 1.323944 38.5000000
# 2 0.4895833 0.531250  0.3478261
# 3 0.3278689 2.705882  1.2424242

Or its base R equivalent:

as.data.frame(lapply(df[paste0("gene",c(1,2,"N"))],function(x) x[c(F,T)]/x[c(T,F)]))

you may need to bind the observations, I skipped this step as it's not in your expected output.

answered Feb 13, 2018 at 16:12

moodymudskipper

47.7k12 gold badges131 silver badges185 bronze badges

Collectives™ on Stack Overflow

How to divide variables between groups of rows using dplyr without listing them?

3 Answers 3

4 Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

4 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related