2

I am attempting to transform columns to rows in R, without the use of reshape (can't install the package). The data I have received includes attributes and their corresponding metrics. I would like to calculate the statistical correlation between all of these attributes--16000 in total with 8 million records. Not all records have the same number of attributes.

To do this, I believe I will have to convert columns to rows so that I can eventually use the cor function e.g. cor(x[,1], x[,2:16000]). This may be wholly unnecessary if there is some way to use the cor function by attribute i.e. correlation between attribute 1 and 2, attribute 1 and 3, attribute 1 ... N. Any help would be much appreciated.

 ID          Attribute  Metric1 
 [1,]  1         1 -1.6363007
 [2,]  2         1  1.1483294
 [3,]  3         1  2.1682566
 [4,]  4         1 -1.1823649
 [5,]  5         1 -1.3631378
 [6,]  1         2 -1.1715544
 [7,]  2         2  1.5164278
 [8,]  3         2 -1.0110274
 [9,]  4         2 -0.9421652
[10,]  5         2 -0.2105443
[11,]  6         2 -0.4143548
[12,]  7         2 -1.6170975
[13,]  8         2  1.2402303
[14,]  9         2  0.4460047
[15,]  7         3  0.1060407
[16,]  8         3  0.9796893
[17,]  9         3  0.9254911
[18,] 10         3 -1.5728600
[19,] 11         3 -0.8082675
[20,] 12         3 -1.8643084

Transformation:

ID  attribute1  attribute2  attribute3
1   -1.6363007  -1.1715544  na
2   1.1483294   1.5164278   na
3   2.1682566   -1.0110274  na
4   -1.1823649  -0.9421652  na
5   -1.3631378  -0.2105443  na
6   na          -0.4143548  na
7   na          -1.6170975  0.1060407
8   na           1.2402303  0.9796893
9   na           0.4460047  0.9254911
10  na           na         -1.57286
11  na           na         -0.8082675
12  na           na         -1.8643084


test <- cbind(c(rep(1,5),rep(2,9),rep(3,6)), replicate(1,rnorm(20)))
test <- cbind(c(1:5,1:9,7:12),test)

@Aaron

q <- matrix(nrow=20,ncol=3)
colnames(q) <- c("x","y","z")
q[,3] <- replicate(1, rnorm(20))
q[,2] <- c(101,102,103,104,105,106, 107, 108, 101,103,107,109, 104,110,102,103,106,109,108,112)
q[15:20,1] <- 10000003
q[9:14,1] <- 10000002
q[1:8,1] <- 10000001
q <- data.frame(q)
q$x <- factor(q$x)
q$y <- factor(q$y)
q$z <- factor(q$z)

with(q, {
  out <- matrix(nrow=nlevels(x), ncol=nlevels(y),
                dimnames=list(levels(x), levels(y)))
  out[cbind(x, y)] <- z
  out
})
8
  • Could you provide a reproducible example of the starting data using dput? Commented Jan 7, 2013 at 18:51
  • Also, you won't have any luck with cor if there are NAs in your resulting dataframe. Commented Jan 7, 2013 at 18:52
  • 2
    There are lots of ways to do this, even without the handy reshape package; this answer has collected many of them in one place: stackoverflow.com/a/9617424/210673 Commented Jan 7, 2013 at 18:56
  • @Aaron Per your matrix indexing example, is there a way to include the values for z rather than the indexes? Commented Jan 8, 2013 at 17:04
  • 1
    Just don't make z a factor. If you want it numeric, leave it that way. If you want it a character, use as.character. Commented Jan 8, 2013 at 19:33

2 Answers 2

5

No need for the "reshape" or "reshape2" packages. Just use base R reshape(). Assuming your data.frame is named "temp":

reshape(temp, direction = "wide", idvar="ID", timevar="Attribute")
#       ID Metric1.1  Metric1.2  Metric1.3
# [1,]   1 -1.636301 -1.1715544         NA
# [2,]   2  1.148329  1.5164278         NA
# [3,]   3  2.168257 -1.0110274         NA
# [4,]   4 -1.182365 -0.9421652         NA
# [5,]   5 -1.363138 -0.2105443         NA
# [11,]  6        NA -0.4143548         NA
# [12,]  7        NA -1.6170975  0.1060407
# [13,]  8        NA  1.2402303  0.9796893
# [14,]  9        NA  0.4460047  0.9254911
# [18,] 10        NA         NA -1.5728600
# [19,] 11        NA         NA -0.8082675
# [20,] 12        NA         NA -1.8643084

If your data are a matrix and not a data.frame, you will need to convert it to a data.frame before using reshape(), or you can use xtabs(). However, using xtabs() creates zeroes instead of NAs. Here's the xtabs() approach:

xtabs(Metric1 ~ ID + Attribute, tempm)
#     Attribute
# ID            1          2          3
#   1  -1.6363007 -1.1715544  0.0000000
#   2   1.1483294  1.5164278  0.0000000
#   3   2.1682566 -1.0110274  0.0000000
#   4  -1.1823649 -0.9421652  0.0000000
#   5  -1.3631378 -0.2105443  0.0000000
#   6   0.0000000 -0.4143548  0.0000000
#   7   0.0000000 -1.6170975  0.1060407
#   8   0.0000000  1.2402303  0.9796893
#   9   0.0000000  0.4460047  0.9254911
#   10  0.0000000  0.0000000 -1.5728600
#   11  0.0000000  0.0000000 -0.8082675
#   12  0.0000000  0.0000000 -1.8643084 
Sign up to request clarification or add additional context in comments.

5 Comments

When I try to implement this line I am prompted with the error message: "Error in match(rval[, idvar], thistime[, idvar]) : subscript out of bounds"
@user1636475, is your input a data.frame or a matrix? Try converting it to a data.frame before using the example code provided.
I thought that was the issue, but definitely a matrix. temp<- cbind(c(rep(1,5),rep(2,9),rep(3,6)), replicate(1,rnorm(20))) temp <- cbind(c(1:5,1:9,7:12),test) colnames(temp) <- c("ID", "Attribute", "Metric1") reshape(temp, direction = "wide", idvar="ID", timevar="Attribute")
@user1636475, as I mentioned, reshape() expects a data.frame, so try reshape(data.frame(YOURMATRIXNAME, etc instead.
@user1636475, you may want to also start accepting answers to some of your questions. I notice that you've asked four questions on Stack Overflow, but have not marked any of the answers in them as accepted (look for the checkmark next to the answers). Of course, do this only if the answers meet your expectations.
0

using reshape2

  acast(dat, ID~Attribute, value.var="Metric1")
           1          2          3
1  -1.636301 -1.1715544         NA
2   1.148329  1.5164278         NA
3   2.168257 -1.0110274         NA
4  -1.182365 -0.9421652         NA
5  -1.363138 -0.2105443         NA
6         NA -0.4143548         NA
7         NA -1.6170975  0.1060407
8         NA  1.2402303  0.9796893
9         NA  0.4460047  0.9254911
10        NA         NA -1.5728600
11        NA         NA -0.8082675
12        NA         NA -1.8643084

2 Comments

I'm assuming that if they can't install the "reshape" package, they also can't install the "reshape2" package ;)
@AnandaMahto I assume nothing not existing in the question :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.