2

I have the following data frame with 345 rows and 237 columns in R:

snp1 snp2 snp3 ... snp237 
0 1 2 ... 0
0 1 1 ... 1
1 1 2 ... 2
1 0 0 ... 0
... ... ... ...
2 2 1 ... 0

I want to apply the following function in each column:

D=(number of 0)/(number of rows)
H=(number of 1)/(number of rows)
R=(number of 2)/(number of rows)
p=D+(0.5*H)
q=R+(0.5*H)

Lastly, I want to store the "p" and "q" for each snp in a vector. This function have calculate "p" and "q" for each snp in a single command of R. It is possible?

The output is:

snp1 snp2 snp3 ... snp237
p1 p2 p3 ... ... p237
q1 q2 q3 ... ... q237

Thanks in advance.

3
  • 2
    What does your expected output look like? Commented Mar 24, 2019 at 2:52
  • I expect a vector with p and q for each snp (column). Commented Mar 24, 2019 at 2:58
  • Try f1 <- function(x) {H <- mean(x == 1);list(p = mean(x == 0) + (0.5 * H), q = mean(x == 2) + (0.5 * H))}; library(dplyr); df1 %>% summarise_all(f1) %>% unnest Commented Mar 24, 2019 at 3:05

2 Answers 2

3
#DATA
set.seed(42)
d = data.frame(snp1 = sample(0:2, 10, TRUE),
               snp2 = sample(0:2, 10, TRUE),
               snp3 = sample(0:2, 10, TRUE))

#Function    
foo = function(x){
    len = length(x)
    D = sum(x == 0)/len
    H = sum(x == 1)/len
    R = sum(x == 2)/len
    p = D + 0.5 * H
    q = R + 0.5 * H
    return(c(p = p, q = q))
}

#Run foo for each column   
sapply(d, foo)
#  snp1 snp2 snp3
#p 0.35 0.4  0.35
#q 0.65 0.6  0.65
Sign up to request clarification or add additional context in comments.

Comments

1

Here is an option with tidyverse. Create a function (f1) based on the logic in OP's code to return a list of length 2, then use that in summarise_all to apply the function on each of the columns of dataset

library(dplyr)
library(tidyr)
f1 <- function(x) {
              H <- 0.5 * mean(x == 1)
              list(list(p = mean(x == 0) + H,
                  q = mean(x == 2) + H))
                  }
df1 %>%
   summarise_all(f1) %>% 
   unnest
#  snp1  snp2  snp3
#1 0.75 0.625 0.375
#2 0.25 0.375 0.625

data

df1 <- structure(list(snp1 = c(0L, 0L, 1L, 1L), snp2 = c(1L, 1L, 1L, 
 0L), snp3 = c(2L, 1L, 2L, 0L)), class = "data.frame", row.names = c(NA, 
  -4L))

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.