1

I am trying to correlate several variables according to a specific group (COUNTY) in R. Although I am able to successfully find the correlation for each column through this method, I can't seem to find a way to save the p-value to the table for each group. Any suggestions?

Example Data:

crops <- data.frame(
    COUNTY = sample(37001:37900), 
    CropYield = sample(c(1:100), 10, replace = TRUE), 
    MaxTemp =sample(c(40:80), 10, replace = TRUE),
    precip =sample(c(0:10), 10, replace = TRUE), 
    ColdDays =sample(c(1:73), 10, replace = TRUE))

Example Code:

crops %>% 
     group_by(COUNTY) %>%
     do(data.frame(Cor=t(cor(.[,2:5], .[,2]))))

^This gives me the correlation for each column but I need to know the p-value for each one as well. Ideally the final output would look like this.

Desired Output

2
  • Please provide reproducible examples so we can help you. Commented Mar 9, 2020 at 20:20
  • @eonurk I have added more information! Hope this helps Commented Mar 9, 2020 at 20:43

1 Answer 1

1

You only have 1 observation per COUNTY, so it will not work.. I set more examples per COUNTY:

set.seed(111)
crops <- data.frame(
    COUNTY = sample(37001:37002,10,replace=TRUE), 
    CropYield = sample(c(1:100), 10, replace = TRUE), 
    MaxTemp =sample(c(40:80), 10, replace = TRUE),
    precip =sample(c(0:10), 10, replace = TRUE), 
    ColdDays =sample(c(1:73), 10, replace = TRUE))

I think you need to convert to a long format, and do a cor.test per COUNTY and variable

calcor=function(da){
data.frame(cor.test(da$CropYield,da$value)[c("estimate","p.value")])
}

crops %>% 
pivot_longer(-c(COUNTY,CropYield)) %>% 
group_by(COUNTY,name) %>% do(calcor(.))

# A tibble: 6 x 4
# Groups:   COUNTY, name [6]
  COUNTY name     estimate p.value
   <int> <chr>       <dbl>   <dbl>
1  37001 ColdDays    0.466   0.292
2  37001 MaxTemp    -0.225   0.628
3  37001 precip     -0.356   0.433
4  37002 ColdDays    0.888   0.304
5  37002 MaxTemp     0.941   0.220
6  37002 precip     -0.489   0.674

The above gives you correlation for every variable against crop yield, for every county. Now it's a matter of converting it into wide format:

crops %>% 
pivot_longer(-c(COUNTY,CropYield)) %>% 
group_by(COUNTY,name) %>% do(calcor(.)) %>%
pivot_wider(values_from=c(estimate,p.value),names_from=name)

  COUNTY estimate_ColdDa… estimate_MaxTemp estimate_precip p.value_ColdDays
   <int>            <dbl>            <dbl>           <dbl>            <dbl>
1  37001            0.466           -0.225          -0.356            0.292
2  37002            0.888            0.941          -0.489            0.304
# … with 2 more variables: p.value_MaxTemp <dbl>, p.value_precip <dbl>
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.