0

I would like to have a general function to perform multiple t.tests on data in a data frame with the following example data:

dat <- data.frame(ID=c(1:100),
                  DRUG= rep(c("D1","D2","D2","D3","D3","D3","D5","D1","D4","D2"),10),
                  ADR=rep(c("A1","A2","A3","A6","A7","A8","A4","A2","A1","A2"),10),
                  X= sample(1:250, 100, replace=F))

Basically, I want to run two t.tests for values of X for each unique combination of DRUG - ADR. If I take D1-A1 as an example, I want to test the X values for D1-A1 versus D1-A<>1 and the X values for D1-A1 versus D<>1-A1. Below is my syntax for this example, but my question is how to make a general loop / function to perform two tests for each unique combination of DRUG - ADR.

x <- ifelse (dat$DRUG == "D1" & dat$ADR == "A1",dat$X, NA)
x <- x[!is.na(x)]

y <- ifelse (dat$DRUG != "D1" & dat$ADR == "A1",dat$X, NA)
y <- y[!is.na(y)]

z <- ifelse (dat$DRUG == "D1" & dat$ADR != "A1",dat$X, NA)
z <- z[!is.na(z)]

t.test(x,y)
t.test(x,z)

So for record number 4 (D3-A6) the syntax would be:

x <- ifelse (dat$DRUG == "D3" & dat$ADR == "A6",dat$X, NA)
x <- x[!is.na(x)]

y <- ifelse (dat$DRUG != "D3" & dat$ADR == "A6",dat$X, NA)
y <- y[!is.na(y)]

z <- ifelse (dat$DRUG == "D3" & dat$ADR != "A6",dat$X, NA)
z <- z[!is.na(z)]

t.test(x,y)
t.test(x,z)

Anyone got a good idea for a general function?

EDIT: My ideal result would be the following table:

  Drug ADR pvalue1 pvalue2
1   D1  A1  pval11  pval21
2   D2  A2  pval12  pval22
3  D.. A.. pval1.. pval2..
0

1 Answer 1

1

As in every programming problem, the solution is in two steps:

  1. Abstract your logic to make it general
  2. Encapsulate the abstract solution into a reusable function

The you can proceed to

  1. Call the function repeatedly on all data.

However, first off: the t-tests sometimes fail due to insufficient data; so let’s replace the t.test calls:

t_test = function (x, y, ...) {
    tryCatch(t.test(x, y, ...)$p.value, error = function (err) NA)
}

Then, all taken together, this gives us:

library(dplyr) # Makes data manipulation easier.

test_combination = function (data, id) {
    drug = data[id, ]$DRUG
    adr = data[id, ]$ADR

    match = filter(data, DRUG == drug, ADR == adr)$X
    mismatch1 = filter(data, DRUG != drug, ADR == adr)$X
    mismatch2 = filter(data, DRUG == drug, ADR != adr)$X

    list(pval1 = t_test(match, mismatch1), pval2 = t_test(match, mismatch2))
}

Which tests a single combination. Now we test all of them:

result = lapply(dat$ID, test_combination, data = dat) %>%
    bind_rows() %>%
    bind_cols(dat, .) %>%
    select(-X)

Or, using a more dplyr-like (but in my opinion somewhat obscure) approach:

result = dat %>%
    rowwise() %>%
    do(bind_rows(test_combination(dat, .$ID))) %>%
    bind_cols(dat, .) %>%
    select(-X)

Note how this code doesn’t use explicit for loops. This is how you process data in R: you apply a function to items in a table or list, rather than iterating manually.

Note that the above is highly questionable, statistically speaking. At the very least you need to perform rigorous multiple testing correction.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.