Create dataframe with correlation and p-value by group?

Question

I am trying to correlate several variables according to a specific group (COUNTY) in R. Although I am able to successfully find the correlation for each column through this method, I can't seem to find a way to save the p-value to the table for each group. Any suggestions?

Example Data:

crops <- data.frame(
    COUNTY = sample(37001:37900), 
    CropYield = sample(c(1:100), 10, replace = TRUE), 
    MaxTemp =sample(c(40:80), 10, replace = TRUE),
    precip =sample(c(0:10), 10, replace = TRUE), 
    ColdDays =sample(c(1:73), 10, replace = TRUE))

Example Code:

crops %>% 
     group_by(COUNTY) %>%
     do(data.frame(Cor=t(cor(.[,2:5], .[,2]))))

^This gives me the correlation for each column but I need to know the p-value for each one as well. Ideally the final output would look like this.

Desired Output

Please provide reproducible examples so we can help you.

eonurk
– eonurk

2020-03-09 20:20:30 +00:00
Commented Mar 9, 2020 at 20:20 — eonurk
– eonurk, Commented Mar 9, 2020 at 20:20
@eonurk I have added more information! Hope this helps

m1994
– m1994

2020-03-09 20:43:46 +00:00
Commented Mar 9, 2020 at 20:43 — m1994
– m1994, Commented Mar 9, 2020 at 20:43

StupidWolf · Accepted Answer · 2020-03-09 23:18:57Z

You only have 1 observation per COUNTY, so it will not work.. I set more examples per COUNTY:

set.seed(111)
crops <- data.frame(
    COUNTY = sample(37001:37002,10,replace=TRUE), 
    CropYield = sample(c(1:100), 10, replace = TRUE), 
    MaxTemp =sample(c(40:80), 10, replace = TRUE),
    precip =sample(c(0:10), 10, replace = TRUE), 
    ColdDays =sample(c(1:73), 10, replace = TRUE))

I think you need to convert to a long format, and do a cor.test per COUNTY and variable

calcor=function(da){
data.frame(cor.test(da$CropYield,da$value)[c("estimate","p.value")])
}

crops %>% 
pivot_longer(-c(COUNTY,CropYield)) %>% 
group_by(COUNTY,name) %>% do(calcor(.))

# A tibble: 6 x 4
# Groups:   COUNTY, name [6]
  COUNTY name     estimate p.value
   <int> <chr>       <dbl>   <dbl>
1  37001 ColdDays    0.466   0.292
2  37001 MaxTemp    -0.225   0.628
3  37001 precip     -0.356   0.433
4  37002 ColdDays    0.888   0.304
5  37002 MaxTemp     0.941   0.220
6  37002 precip     -0.489   0.674

The above gives you correlation for every variable against crop yield, for every county. Now it's a matter of converting it into wide format:

crops %>% 
pivot_longer(-c(COUNTY,CropYield)) %>% 
group_by(COUNTY,name) %>% do(calcor(.)) %>%
pivot_wider(values_from=c(estimate,p.value),names_from=name)

  COUNTY estimate_ColdDa… estimate_MaxTemp estimate_precip p.value_ColdDays
   <int>            <dbl>            <dbl>           <dbl>            <dbl>
1  37001            0.466           -0.225          -0.356            0.292
2  37002            0.888            0.941          -0.489            0.304
# … with 2 more variables: p.value_MaxTemp <dbl>, p.value_precip <dbl>

Collectives™ on Stack Overflow

Create dataframe with correlation and p-value by group?

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related