4

I have the following dataframe:

col1 <- c("avi","chi","chi","bov","fox","bov","fox","avi","bov",
          "chi","avi","chi","chi","bov","bov","fox","avi","bov","chi")
col2 <- c("low","med","high","high","low","low","med","med","med","high",
          "low","low","high","high","med","med","low","low","med")
col3 <- c(0,1,1,1,0,1,0,0,0,0,0,0,1,1,1,1,0,1,0)

test_data <- cbind(col1, col2, col3)
test_data <- as.data.frame(test_data)

And I want to end up with something like this table (values are random):

Species  Pop.density  %Resistance  CI_low  CI_high   Total samples
avi      low          2.0          1.2     2.2       30
avi      med          0            0       0.5       20
avi      high         3.5          2.9     4.2       10
chi      low          0.5          0.3     0.7       20
chi      med          2.0          1.9     2.1       150
chi      high         6.5          6.2     6.6       175

The % resistance column is based on the col3 above, where 1 = resistant, and 0 = nonresistant. I have tried the following:

library(dplyr)
test_data<-test_data %>%
  count(col1,col2,col3) %>%
  group_by(col1, col2) %>%
  mutate(perc_res = prop.table(n)*100)

I tried this, and it seem to almost do the trick, as i get the percentage of total 1s and 0s in col3, for each value in col1 and 2, however the total samples is wrong since I am counting all three columns, when the correct count would be for only col1 and 2.

For the confidence interval i would use the following:

binom.test(resistant samples,total samples)$conf.int*100

However I am not sure how to implement it together with the rest. Is there a simple and quick way to do this?

4
  • I'd suggest using the group_by and then the summarise function. Commented Sep 15, 2017 at 14:59
  • Use data.frame(col1, col2, col3), not cbind, which forces every column to string here. Commented Sep 15, 2017 at 15:25
  • Your example data does not have an ("avi", "high") pair. Would you want that row to appear anyways (with NAs and zero samples count)? Commented Sep 15, 2017 at 15:36
  • 1
    If they do not exist, I do not need them to appear. Commented Sep 16, 2017 at 18:31

2 Answers 2

6

I'd do...

library(data.table)
setDT(DT)

DT[, { 
  bt <- binom.test(sum(resists), .N)$conf.int*100
  .(res_rate = mean(resists)*100, res_lo = bt[1], res_hi = bt[2], n = .N)
}, keyby=.(species, popdens)]

    species popdens  res_rate    res_lo    res_hi n
 1:     avi     low   0.00000  0.000000  70.75982 3
 2:     avi     med   0.00000  0.000000  97.50000 1
 3:     bov     low 100.00000 15.811388 100.00000 2
 4:     bov     med  50.00000  1.257912  98.74209 2
 5:     bov    high 100.00000 15.811388 100.00000 2
 6:     chi     low   0.00000  0.000000  97.50000 1
 7:     chi     med  50.00000  1.257912  98.74209 2
 8:     chi    high  66.66667  9.429932  99.15962 3
 9:     fox     low   0.00000  0.000000  97.50000 1
10:     fox     med  50.00000  1.257912  98.74209 2

To include all levels (combinations of species and population density)...

DT[CJ(species = species, popdens = popdens, unique = TRUE), on=.(species, popdens), {
  bt <- 
    if (.N > 0L) binom.test(sum(resists), .N)$conf.int*100 
    else NA_real_
  .(res_rate = mean(resists)*100, res_lo = bt[1], res_hi = bt[2], n = .N)    
}, by=.EACHI]

    species popdens  res_rate    res_lo    res_hi n
 1:     avi     low   0.00000  0.000000  70.75982 3
 2:     avi     med   0.00000  0.000000  97.50000 1
 3:     avi    high        NA        NA        NA 0
 4:     bov     low 100.00000 15.811388 100.00000 2
 5:     bov     med  50.00000  1.257912  98.74209 2
 6:     bov    high 100.00000 15.811388 100.00000 2
 7:     chi     low   0.00000  0.000000  97.50000 1
 8:     chi     med  50.00000  1.257912  98.74209 2
 9:     chi    high  66.66667  9.429932  99.15962 3
10:     fox     low   0.00000  0.000000  97.50000 1
11:     fox     med  50.00000  1.257912  98.74209 2
12:     fox    high        NA        NA        NA 0

How it works

The syntax is DT[i, j, by=] where ...

  • i determines a subset of rows, sometimes with helper arguments, on= or roll=.
  • by= determines groups within the subsetted table, switched to keyby= if sorting.
  • j is code acting on each group.

j should evaluate to a list, with .() being a shortcut to list(). See ?data.table for details.

Data used

(renamed columns, reformatted binary variable back to 0/1 or false/true, set levels of population density in the right order):

DT = data.frame(
  species = col1, 
  popdens = factor(col2, levels=c("low", "med", "high")), 
  resists = col3
)
Sign up to request clarification or add additional context in comments.

Comments

3

This ought to do it.

library(tidyverse)
library(broom)

test_data %>%
  mutate(col3 = ifelse(col3 == 0, "NonResistant", "Resistant")) %>%
  count(col1, col2, col3) %>%
  spread(col3, n, fill = 0) %>%
  mutate(PercentResistant = Resistant / (NonResistant + Resistant)) %>%
  mutate(test = map2(Resistant, NonResistant, ~ binom.test(.x, .x + .y) %>% tidy())) %>%
  unnest() %>%
  transmute(Species = col1, Pop.density = col2, PercentResistant, CI_low = conf.low * 100, CI_high = conf.high * 100, TotalSamples = Resistant + NonResistant)
  1. Mutate the 0/1 resistance column so that it's got readable values.
  2. Count the values in each bucket.
  3. Spread col3/n into two columns, Resistant/NonResistant, and put the counts (n) into those columns. Now each row has everything it needs to do your test.
  4. Calculate the Percent Resistance
  5. Perform your test for each bucket and put the results in a nested frame called test.
  6. Unnest the test data frame so you can work with the test results.
  7. Clean up, give everything nice names.

Results

enter image description here

7 Comments

Great solution! Can I ask where conf.low comes from as after unnest() I see only estimate and statistic?
@PoGibas: conf.low comes from the tidy(), which is then unnested. If you see estimate it should be there. Wild guess, your window isn't that wide and there's something like "...with X more variables" below your results?
aaa, it's tbl_df, %>% data.frame() shows it. Can't get used to tibble
This is brilliant! Which part is from the "Broom" package? is it the unnest/nest and transmute?
@Haakonkas: broom turns models into data frames with the tidy() method.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.