0

coming from Java I am trying to learn R (and some statistics) I am trying to reproduce the following table enter image description here

from Jonathan Gillard: A First Course in Statistical Inference This table shows the possible results of two draws with replacement from a piggybank piggybank <- c(5, 10, 10, 20, 50, 50)

With the following code I encounter some unexpected (at least for a Java programmer) behavior.

The first seven lines basically give me what I want, except for (5, 10) and (10, 5) should be aggregated in one category. I have thought of using a set for this, but the sets library seems to mess up the first 7 lines.

    library(dplyr)
rm(list=ls())
piggybank <- c(5, 10, 10, 20, 50, 50)
draws <- expand.grid(d1=piggybank, d2=piggybank)
draws <- draws %>% rowwise() %>% mutate(sum=sum(c(d1,d2)), var=var(c(d1,d2)), mean=mean(c(d1,d2)))
draws <- draws %>% group_by(d1, d2, var, mean, sum) %>% summarise(n=n())
draws <- draws %>% ungroup() %>% mutate(P=n/sum(n))
nr <- nrow(draws)
aggdraws <- data.frame(x1x2=character(0), var=numeric(0), mean=numeric(0), sum=numeric(0), n=numeric(0))
str(aggdraws)
local(
  for (i in 1:nr) {
    newrow <<- data.frame(x1x2=character(1), var=numeric(1), mean=numeric(1), sum=numeric(1), n=numeric(10))
    newrow$n <- draws[i, ]$n
    newrow$var <- draws[i, ]$var
    newrow$mean <- draws[i, ]$mean
    newrow$sum <- draws[i, ]$mean
    
    newrow$x1x2 <- paste(min(draws[i, ]$d1, draws[i, ]$d2), max(draws[i, ]$d1, draws[i, ]$d2))
    
    #print(aggdraws)
    if (nrow(aggdraws) > 0) {
      for(j in 1:nrow(aggdraws)) {
        print(paste(aggdraws[j,]$x1x2, newrow$x1x2))
        if(aggdraws[j,]$x1x2 == newrow$x1x2) {
          aggdraws[j,]$n <- aggdraws[j,]$n +newrow$n
        } else {
          aggdraws[nrow(aggdraws)+1, ] <- newrow
        }
      }
    } else {
      aggdraws[nrow(aggdraws)+1, ] <- newrow
    }
  }
)

newrow <<- data.frame(x1x2=character(1), var=numeric(1), mean=numeric(1), sum=numeric(1), n=numeric(10)) creates a data.frame with 10 observations of 5 variables. Why? I need a dataframe with 1 observation.

newrow seems not to be local to the for loop, it is filled with a row in each iteration. I need a new instance in every iteration

Probably because of this behavior if(aggdraws[j,]$x1x2 == newrow$x1x2) never evaluates to TRUE

Any help would be greatly appreciated. Is there a good book or other source which points out the pitfalls of R for a programmer coming from Java or another object-oriented language?

Thanks,

Hans

2
  • Hi there. I'm not really sure about the exact problem. A good start with R is the book R for Data Science, for more advanced readingn Advanced R. Commented May 22, 2021 at 16:54
  • newrow <<- data.frame(x1x2=character(1), var=numeric(1), mean=numeric(1), sum=numeric(1), n=numeric(10)) --> n = numeric(10) creates a vector of length 10. Replace it with numeric(1) for a vector with length 1. Probably that's the source of your strange behaviour. Commented May 22, 2021 at 17:16

1 Answer 1

2

That's not an direct answer to your question. I took a look at your code and optimized the first part for dplyr:

draws <- expand.grid(d1=piggybank, d2=piggybank) %>% 
  rowwise() %>%
  mutate(d1_new = min(d1, d2),
         d2_new = max(d1, d2)) %>%
  select(d1 = d1_new, d2 = d2_new) %>%
  mutate(sum = sum(d1, d2), 
         var = var(c(d1, d2)), 
         mean = mean(d1, d2))%>% 
  group_by(d1, d2, var, mean, sum) %>% 
  summarise(n = n(), .groups="drop") %>%
  mutate(P = n/sum(n))

returns

# A tibble: 10 x 7
      d1    d2    var  mean   sum     n      P
   <dbl> <dbl>  <dbl> <dbl> <dbl> <int>  <dbl>
 1     5     5    0       5    10     1 0.0278
 2     5    10   12.5     5    15     4 0.111 
 3     5    20  112.      5    25     2 0.0556
 4     5    50 1012.      5    55     4 0.111 
 5    10    10    0      10    20     4 0.111 
 6    10    20   50      10    30     4 0.111 
 7    10    50  800      10    60     8 0.222 
 8    20    20    0      20    40     1 0.0278
 9    20    50  450      20    70     4 0.111 
10    50    50    0      50   100     4 0.111 

which is pretty much your table from Jonathan Gillard: A First Course in Statistical Inference.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.