94

I have 92 set of data of same type.

I want to make a correlation matrix for any two combinations possible.

i.e., I want a matrix of 92x92.

such that element (ci,cj) should be correlation between ci and cj.

How do I do that?

4
  • 6
    Have a look to the cor function, or to the rcorr function in the Hmisc package Commented May 21, 2012 at 7:08
  • 1
    I'm able to find the cor between two parameters. The thing is how to arrange them in matrix? Commented May 21, 2012 at 7:16
  • 8
    How on Earth did this get so many upvotes? Commented May 22, 2016 at 21:31
  • @anon Putting correlation matrices is common inscientific articles Commented Feb 25 at 8:41

6 Answers 6

108

An example,

 d <- data.frame(x1=rnorm(10),
                 x2=rnorm(10),
                 x3=rnorm(10))
cor(d) # get correlations (returns matrix)
Sign up to request clarification or add additional context in comments.

Comments

77

You could use 'corrplot' package.

d <- data.frame(x1=rnorm(10),
                 x2=rnorm(10),
                 x3=rnorm(10))
M <- cor(d) # get correlations

library('corrplot') #package corrplot
corrplot(M, method = "circle") #plot matrix

enter image description here

More information here: http://cran.r-project.org/web/packages/corrplot/vignettes/corrplot-intro.html

2 Comments

Is it possible to obtain the graph similar to these ones cran.r-project.org/web/packages/corrplot/vignettes/…, or a simple matrix, but with the R-squared instead of pearson, kendall, or spearman correlation?
R2 equals the square of the Pearson correlation coefficient. So all you need is multiply M by M (multiply correlation matrix by itself), before creating the plot.
18

The cor function will use the columns of the matrix in the calculation of correlation. So, the number of rows must be the same between your matrix x and matrix y. Ex.:

set.seed(1)
x <- matrix(rnorm(20), nrow=5, ncol=4)
y <- matrix(rnorm(15), nrow=5, ncol=3)
COR <- cor(x,y)
COR
image(x=seq(dim(x)[2]), y=seq(dim(y)[2]), z=COR, xlab="x column", ylab="y column")
text(expand.grid(x=seq(dim(x)[2]), y=seq(dim(y)[2])), labels=round(c(COR),2))

enter image description here

Edit:

Here is an example of custom row and column labels on a correlation matrix calculated with a single matrix:

png("corplot.png", width=5, height=5, units="in", res=200)
op <- par(mar=c(6,6,1,1), ps=10)
COR <- cor(iris[,1:4])
image(x=seq(nrow(COR)), y=seq(ncol(COR)), z=cor(iris[,1:4]), axes=F, xlab="", ylab="")
text(expand.grid(x=seq(dim(COR)[1]), y=seq(dim(COR)[2])), labels=round(c(COR),2))
box()
axis(1, at=seq(nrow(COR)), labels = rownames(COR), las=2)
axis(2, at=seq(ncol(COR)), labels = colnames(COR), las=1)
par(op)
dev.off()

enter image description here

5 Comments

@Manuel Ramón 's example is probably best for your case (a single matrix) - organize your data sets as columns.
in the image above, how can one 'invert' the colors, the is red one correlation is close to -1 or 1 and white when close to 0?
image(x=seq(dim(x)[2]), y=seq(dim(y)[2]), z=COR, col=rev(heat.colors(20)), xlab="x column", ylab="y column")
@Marcinthebox how would you add variable labels to x and y axis (instead of numbers)? Thanks
@AgustínIndaco - I have updated my answer with a further example. The image function doesn't automatically take the row and column names, so this must be added.
16

Have a look at qtlcharts. It allows you to create interactive correlation matrices:

library(qtlcharts)
data(iris)
iris$Species <- NULL
iplotCorr(iris, reorder=TRUE)

enter image description here

It's more impressive when you correlate more variables, like in the package's vignette: enter image description here

Comments

2

There are other ways to achieve this here: (Plot correlation matrix into a graph), but I like your version with the correlations in the boxes. Is there a way to add the variable names to the x and y column instead of just those index numbers? For me, that would make this a perfect solution. Thanks!

edit: I was trying to comment on the post by [Marc in the box], but I clearly don't know what I'm doing. However, I did manage to answer this question for myself.

if d is the matrix (or the original data frame) and the column names are what you want, then the following works:

axis(1, 1:dim(d)[2], colnames(d), las=2)
axis(2, 1:dim(d)[2], colnames(d), las=2)

las=0 would flip the names back to their normal position, mine were long, so I used las=2 to make them perpendicular to the axis.

edit2: to suppress the image() function printing numbers on the grid (otherwise they overlap your variable labels), add xaxt='n', e.g.:

image(x=seq(dim(x)[2]), y=seq(dim(y)[2]), z=COR, col=rev(heat.colors(20)), xlab="x column", ylab="y column", xaxt='n')

Comments

0

Have a look at the datasummary_correlation (article, vignette) function of modelsummary.

Here is an example of a typical correlation table:

library(correlation)
library(modelsummary)
library(tidyverse)

fun <- function(x) {
  out <- x |>
    correlation() |>
    summary(redundant = TRUE) |>
    format(digits=2)  |>
    as.matrix()
  row.names(out) <- out[, 1]
  out <- out[, 2:ncol(out)]
  lt <- upper.tri(out)
  out[lt] <- ""
  diag(out) <- rep("1.00", nrow(out))
  return(out)
}

datasummary_correlation(
  mtcars, 
  method = fun)

enter image description here

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.